Discover Hadoop: 2012

Sunday, January 15, 2012

Get Output Path of Job in Hadoop MapReduce Framework

FileOutputFormat.getOutputPath(context)

How to Avoid Sorting and Partitioning in Map only Job

We can define MapReduce job with no reducer. In this case, all the mappers write their outputs under specified job output directory. So; there will be no sorting and no partitioning.
Just set the number of reduces to 0.

job.setNumReduceTasks(0);

How to overcome on Java Heap Space Error in Hadoop MapReduce Framework

mapred.map.child.java.opts // heap size for map tasks
mapred.reduce.child.java.opts // heap size for reduce tasks

Configuration conf = new Configuration();
conf.set("mapred.map.child.java.opts", "-Xmx512m");
conf.set("mapred.reduce.child.java.opts", "-Xmx512m");

It will override any existing values.

Or

open your conf/mapred-site.xml and set these value


    mapred.map.child.java.opts
    -Xmx1024m
    heap size for map tasks 
  


  
    mapred.reduce.child.java.opts
    -Xmx1024m
    heap size for reduce tasks

Make sure ((num_of_maps * map_heap_size) + (num_of_reducers * reduce_heap_size)) is not larger than memory available in the system. Max number of mappers & reducers can also be tuned looking at available system resources.

How to Pass Parameter to Mapper in Hadoop Map Reuce

In Main of Your Program

String Input_Parameter_Value = "Value_send_to_Mapper";
Configuration conf = HBaseConfiguration.create();
conf.set("Input_Parameter_Name",Input_Parameter_Value);

In Mapper, I access this value in setup function

String value = context.getConfiguration().get("Input_Parameter_Name");
System.out.print("Value: "+value);

Thursday, January 12, 2012

Get File Name Processod by the Mapper in Hadoop Map Reduce

FileSplit fileSplit = (FileSplit)context.getInputSplit();
String filenameProcessod = fileSplit.getPath().getName().toString();
System.out.println("File Name Processing "+filenameProcessod);

Discover Hadoop