Sunday, January 15, 2012

Get Output Path of Job in Hadoop MapReduce Framework

1
FileOutputFormat.getOutputPath(context)

How to Avoid Sorting and Partitioning in Map only Job

We can define MapReduce job with no reducer. In this case, all the mappers write their outputs under specified job output directory. So; there will be no sorting and no partitioning.
Just set the number of reduces to 0.

1
job.setNumReduceTasks(0);

How to overcome on Java Heap Space Error in Hadoop MapReduce Framework

1
2
mapred.map.child.java.opts // heap size for map tasks
mapred.reduce.child.java.opts // heap size for reduce tasks

1
2
3
Configuration conf = new Configuration();
conf.set("mapred.map.child.java.opts", "-Xmx512m");
conf.set("mapred.reduce.child.java.opts", "-Xmx512m");
It will override any existing values.

Or

open your conf/mapred-site.xml and set these value
1
2
3
4
5
6
7
8
9
10
11
12
<property>
    <name>mapred.map.child.java.opts</name>
    <value>-Xmx1024m</value>
    <description>heap size for map tasks </description>
  </property>
 
 
  <property>
    <name>mapred.reduce.child.java.opts</name>
    <value>-Xmx1024m</value>
    <description>heap size for reduce tasks </description>
  </property>

Make sure ((num_of_maps * map_heap_size) + (num_of_reducers * reduce_heap_size)) is not larger than memory available in the system. Max number of mappers & reducers can also be tuned looking at available system resources.

How to Pass Parameter to Mapper in Hadoop Map Reuce

In Main of Your Program

1
2
3
String Input_Parameter_Value = "Value_send_to_Mapper";
Configuration conf = HBaseConfiguration.create();
conf.set("Input_Parameter_Name",Input_Parameter_Value);

In Mapper, I access this value in setup function

1
2
String value = context.getConfiguration().get("Input_Parameter_Name");
System.out.print("Value: "+value);

Thursday, January 12, 2012

Get File Name Processod by the Mapper in Hadoop Map Reduce


1
2
3
FileSplit fileSplit = (FileSplit)context.getInputSplit();
String filenameProcessod = fileSplit.getPath().getName().toString();
System.out.println("File Name Processing "+filenameProcessod);