Johnny Programmer's Blog: Reading from HBase in Hadoop

A useful application of a MapReduce job could be to move data from a large data store and summarize it in a smaller store. I decided to try moving data from HBase in to another store. Using the latest library it is pretty easy. I started with a simple table called baseballscores. The first row with id 'game1' looks like this:

hbase(main):001:0> get 'baseballscores', 'game1'
COLUMN                CELL         
 loser:score          timestamp=1325913333740, value=2  
 loser:team           timestamp=1325913325984, value=Rays     
 winner:score         timestamp=1325913306939, value=5   
 winner:team          timestamp=1325913295557, value=Athletics

The table has two column families and two columns per family. To read from this table, we first we need to define our configuration. We can use the TableMapReduceUtil class to help set up the mapper job. We need to pass it the name of our table, the mapper class name, the mapper output key class type and the output value class from the mapper.


    //create our configuration
    Configuration conf = HBaseConfiguration.create();
    Job job = Job.getInstance(conf, "HBaseReader");
    job.setJarByClass(HBaseMapReduce.class);
    Scan scan = new Scan();

    //set up mapper job to use baseballscores table
    TableMapReduceUtil.initTableMapperJob("baseballscores", scan, HBaseMapper.class, Text.class, Text.class, job);

Now we can create out mapper class to take the data from the HBase table. The values collection contains our column data. To get a specific column, we can call Values.getValue and pass in the name of the column family and then the column name. Here we extract the winning and losing team names and then associate them with a win or loss key so we can total them later.

   static class HBaseMapper extends TableMapper<Text, Text> {

        @Override
        public void map(ImmutableBytesWritable row, Result values, Context context) throws IOException {

         Text winningTeam = new Text(Bytes.toString(values.getValue(Bytes.toBytes("winner"), Bytes.toBytes("team"))));
         Text losingTeam = new Text(Bytes.toString(values.getValue(Bytes.toBytes("loser"), Bytes.toBytes("team"))));

         try {
                context.write("win", winningTeam);
                context.write("loss", losingTeam);
            } catch (InterruptedException e) {
                throw new IOException(e);
            }
        }
    }

And that's it. You can write the reducer of your choice and put the data anywhere you like.

1 comment:

kalyan hadoopMay 1, 2015 at 6:45 AM
Best Big Data Hadoop Training in Hyderabad @ Kalyan Orienit

Follow the below links to know more knowledge on Hadoop

WebSites:
================
http://www.kalyanhadooptraining.com/

http://www.hyderabadhadooptraining.com/

http://www.bigdatatraininghyderabad.com/

Videos:
===============
https://www.youtube.com/watch?v=-_fTzrgzVQc

https://www.youtube.com/watch?v=Df2Odze87dE

https://www.youtube.com/watch?v=AOfX-tNkYyo

https://www.youtube.com/watch?v=Cyo3y0vlZ3c

https://www.youtube.com/watch?v=jOLSXx6koO4

https://www.youtube.com/watch?v=09mpbNBAmCo

Johnny Programmer's Blog

Thursday, January 12, 2012

Reading from HBase in Hadoop

1 comment:

Blog Archive

About Me