Thursday, January 12, 2012

Reading from HBase in Hadoop

A useful application of a MapReduce job could be to move data from a large data store and summarize it in a smaller store. I decided to try moving data from HBase in to another store. Using the latest library it is pretty easy. I started with a simple table called baseballscores. The first row with id 'game1' looks like this:

hbase(main):001:0> get 'baseballscores', 'game1'
COLUMN CELL
loser:score timestamp=1325913333740, value=2
loser:team timestamp=1325913325984, value=Rays
winner:score timestamp=1325913306939, value=5
winner:team timestamp=1325913295557, value=Athletics


The table has two column families and two columns per family. To read from this table, we first we need to define our configuration. We can use the TableMapReduceUtil class to help set up the mapper job. We need to pass it the name of our table, the mapper class name, the mapper output key class type and the output value class from the mapper.


//create our configuration
Configuration conf = HBaseConfiguration.create();
Job job = Job.getInstance(conf, "HBaseReader");
job.setJarByClass(HBaseMapReduce.class);
Scan scan = new Scan();

//set up mapper job to use baseballscores table
TableMapReduceUtil.initTableMapperJob("baseballscores", scan, HBaseMapper.class, Text.class, Text.class, job);


Now we can create out mapper class to take the data from the HBase table. The values collection contains our column data. To get a specific column, we can call Values.getValue and pass in the name of the column family and then the column name. Here we extract the winning and losing team names and then associate them with a win or loss key so we can total them later.

   static class HBaseMapper extends TableMapper<Text, Text> {

@Override
public void map(ImmutableBytesWritable row, Result values, Context context) throws IOException {

Text winningTeam = new Text(Bytes.toString(values.getValue(Bytes.toBytes("winner"), Bytes.toBytes("team"))));
Text losingTeam = new Text(Bytes.toString(values.getValue(Bytes.toBytes("loser"), Bytes.toBytes("team"))));

try {
context.write("win", winningTeam);
context.write("loss", losingTeam);
} catch (InterruptedException e) {
throw new IOException(e);
}
}
}



And that's it. You can write the reducer of your choice and put the data anywhere you like.

1 comment:

  1. Best Big Data Hadoop Training in Hyderabad @ Kalyan Orienit

    Follow the below links to know more knowledge on Hadoop

    WebSites:
    ================
    http://www.kalyanhadooptraining.com/

    http://www.hyderabadhadooptraining.com/

    http://www.bigdatatraininghyderabad.com/

    Videos:
    ===============
    https://www.youtube.com/watch?v=-_fTzrgzVQc

    https://www.youtube.com/watch?v=Df2Odze87dE

    https://www.youtube.com/watch?v=AOfX-tNkYyo

    https://www.youtube.com/watch?v=Cyo3y0vlZ3c

    https://www.youtube.com/watch?v=jOLSXx6koO4

    https://www.youtube.com/watch?v=09mpbNBAmCo

    ReplyDelete