The first step is to create the output class. You need to be aware of the form returned by the reduce function. In my case, it sends a Text key and a list of integers associated with it so I need to define the template definition accordingly. The simple class looks like this:
public class MyTextOutputFormat extends FileOutputFormat<Text, List<IntWritable>> {
@Override
public org.apache.hadoop.mapreduce.RecordWriter<Text, List<Intwritable>> getRecordWriter(TaskAttemptContext arg0) throws IOException, InterruptedException {
//get the current path
Path path = FileOutputFormat.getOutputPath(arg0);
//create the full path with the output directory plus our filename
Path fullPath = new Path(path, "result.txt");
//create the file in the file system
FileSystem fs = path.getFileSystem(arg0.getConfiguration());
FSDataOutputStream fileOut = fs.create(fullPath, arg0);
//create our record writer with the new file
return new MyCustomRecordWriter(fileOut);
}
}
After figuring out the full path and creating our output file, we then need to create an instance of the actual record writer to pass back to the reduce job. That class allows us to actual write to the file however we want. Again the template definition needs to match the form coming from the reducer. Mine class looks like this:
public class MyCustomRecordWriter extends RecordWriter<Text, List<IntWritable>> {
private DataOutputStream out;
public MyCustomRecordWriter(DataOutputStream stream) {
out = stream;
try {
out.writeBytes("results:\r\n");
}
catch (Exception ex) {
}
}
@Override
public void close(TaskAttemptContext arg0) throws IOException, InterruptedException {
//close our file
out.close();
}
@Override
public void write(Text arg0, List arg1) throws IOException, InterruptedException {
//write out our key
out.writeBytes(arg0.toString() + ": ");
//loop through all values associated with our key and write them with commas between
for (int i=0; i<arg1.size(); i++) {
if (i>0)
out.writeBytes(",");
out.writeBytes(String.valueOf(arg1.get(i)));
}
out.writeBytes("\r\n");
}
}
Finally we need to tell our job about our ouput format and the path before running it.
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(ArrayList.class);
job.setOutputFormatClass(MyTextOutputFormat.class);
FileOutputFormat.setOutputPath(job, new Path("/home/hadoop/out"));
And that's it. Two simple classes allow us all the control we need.
I get a lot of great information here and this is what I am searching for Hadoop. Thank you for your sharing. I have bookmark this page for my future reference.Thanks so much for the work you have put into this post.
ReplyDeleteHadoop Training in hyderabad
Best Big Data Hadoop Training in Hyderabad @ Kalyan Orienit
ReplyDeleteFollow the below links to know more knowledge on Hadoop
WebSites:
================
http://www.kalyanhadooptraining.com/
http://www.hyderabadhadooptraining.com/
http://www.bigdatatraininghyderabad.com/
Videos:
===============
https://www.youtube.com/watch?v=-_fTzrgzVQc
https://www.youtube.com/watch?v=Df2Odze87dE
https://www.youtube.com/watch?v=AOfX-tNkYyo
https://www.youtube.com/watch?v=Cyo3y0vlZ3c
https://www.youtube.com/watch?v=jOLSXx6koO4
https://www.youtube.com/watch?v=09mpbNBAmCo
This post was very useful!!! Simple and accurate. thanks.
ReplyDeletecould you please show the implementation of the custom output format or the above code in the reducer and mapper end. Say I am trying to run word count
ReplyDeleteThis post is really nice and informative. The explanation given is really comprehensive and informative..
ReplyDeleteSpark Training in Chennai
Thanks for helping me to understand basic Hadoop custom file output concepts. As a beginner in Hadoop your post help me a lot.
ReplyDeleteHadoop Training in Velachery | Hadoop Training .
Hadoop Training in Chennai | Hadoop .
The explanation given is really comprehensive and informative. Thank you for your sharing. I have bookmark this page for my future reference.Thanks so much for the work you have put into this post.
ReplyDeleteSalesforce Training in Chennai | Certification | Online Course | Salesforce Training in Bangalore | Certification | Online Course | Salesforce Training in Hyderabad | Certification | Online Course | Salesforce Training in Pune | Certification | Online Course | Salesforce Online Training | Salesforce Training
Very Nice Blog…Thanks for sharing this information with us. Here am sharing some information about training institute.
ReplyDeletebest devops online training in hyderabad