Wednesday, July 22, 2009

Hadoop: java.io.IOException: Type mismatch in key from map

We've been working with hadoop for a while now, and inevitably newbies run into this error the first time they go to create their own Hadoop job. If you are running into this error, it is most likely a mismatch between your Map and/or Reduce implementation and the job configuration.

Your Map implementation probably looks something like this:

public static class MapClass extends MapReduceBase
implements Mapper {
private Text word = new Text();
public void map(LongWritable key, Text value,
OutputCollector output,
Reporter reporter) throws IOException {
...
}


Now, your map and reduce phases can have different output types and that is what sometimes causes the problems. If your phases are producing different types, be sure to set those types in the JobConf. You do this as follows....

Then when configuring your job you need to declare the appropriate output classes.

// Set the outputs for the Map
conf.setMapOutputKeyClass(Text.class);
conf.setMapOutputValueClass(IntWritable.class);

// Set the outputs for the Job
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(ArrayWritable.class);


Hope that saves people some time.

No comments: