Saturday, January 14, 2012

Writing to MySQL in Hadoop

Continuing from the previous post, I will now create a Reducer that writes the baseball results to a MySQL database. This example is mainly helpful for writing summary data and not for tracking on going data as handling keys is tricky. For that you can write a custom output format that uses its own writer to communicate to the database of your choice. For now we will use a simple reducer to write to a simple table. The table I'm writing to looks like this:


mysql> desc records;
+-------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| team | varchar(100) | YES | | NULL | |
| wins | int(11) | YES | | NULL | |
| loses | int(11) | YES | | NULL | |
+-------+--------------+------+-----+---------+-------+
3 rows in set (0.59 sec)


All we care about is the team name and the wins and losses from the source feed. To start out the example we will show the configuration setup.


DBConfiguration.configureDB(conf, "com.mysql.jdbc.Driver", "jdbc:mysql://127.0.0.1/baseball", "visitor","vispass");
//job set up code here
...

//set up the output format with our table name "records" and our three column names
String [] fields = { "team", "wins", "loses"};
DBOutputFormat.setOutput(job, "records", fields);
//tell the job what reducer to use
job.setReducerClass(ScoreReducer.class);


The configuraton line specifies our local MySQL url and the name of the database called baseball along with the user name and password. It is important that this configuration line be placed before the creation of the Job class or the settings will not be persisted to the reducer and output format class. The reducer class takes out Text team key and list of integer values and turns them in to win and loss totals, placing them in a database record.


public static class ScoreReducer extends Reducer<Text, IntWritable, BaseballDBWritable, Text> {

@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {

int wins = 0;
int losses = 0;

//get iterator so we can loop through wins and loses
Iterator<IntWritable> vals = values.iterator();

int sum = 0;
while(vals.hasNext()) {
int n = vals.next().get();

if (n == 0) //0 indicates a loss
losses++;
else //1 indicates a win
wins++;

}
//create our record object to persist the team's wins and losses
BaseballDBWritable record = new BaseballDBWritable(key.toString(), wins, losses);
context.write(record, key);
}

}


To write out to a database, our key class needs to implement the DBWritable class. In this case I've created a custom one called BaseballDBWritable which holds the fields for each record. The ordering of the database record as the key is necessary due to the way Hadoop passes the information off to the output. The value class here is unimportant and not used. The custom writable class looks like this:


public static class BaseballDBWritable implements Writable, DBWritable {

String team;
int wins;
int losses;

public BaseballDBWritable(String team, int wins, int losses) {
this.team = team;
this.wins = wins;
this.losses = losses;
}

@Override
public void readFields(ResultSet arg0) throws SQLException {
}

@Override
public void write(PreparedStatement arg0) throws SQLException {
arg0.setString(1, team);
arg0.setInt(2, wins);
arg0.setInt(3, losses);
}

@Override
public void readFields(DataInput arg0) throws IOException {
}

@Override
public void write(DataOutput arg0) throws IOException {
}
}


All we do here is add our values to the prepared statement created behind the scenes. The default database output format class will handle creating and executing the surrounding sql. Once all this is done your job is ready to go. Assuming a blank table from the start and a number of rows in our source database, the output records should look like this:


mysql> select * from records;
+-----------+------+-------+
| team | wins | loses |
+-----------+------+-------+
| Astros | 1 | 0 |
| Athletics | 2 | 0 |
| Dodgers | 0 | 1 |
| Giants | 1 | 0 |
| Marlins | 1 | 0 |
| Mets | 0 | 1 |
| Padres | 0 | 1 |
| Phillies | 1 | 0 |
| Rays | 0 | 1 |
| Red Sox | 0 | 1 |
| Reds | 0 | 1 |
| Yankees | 1 | 1 |
+-----------+------+-------+
12 rows in set (0.03 sec)

31 comments:

  1. Thanks so very much for taking your time to create this very useful and informative site. I have learned a lot from your site. Thanks!!

    Hadoop Course in Chennai

    ReplyDelete
  2. Thanks for sharing this informative blog. If anyone wants to get Big Data Training Chennai visit fita academy located at Chennai, which offers best Hadoop Training Chennai with years of experienced professionals.

    ReplyDelete
  3. Hi I am Victoria lives in Chennai. I am a technology freak. Recently I did Java Training in Chennai at a leading Java Training Institutes in Chennai. This is really helpful for me to make a bright career in IT industry.

    ReplyDelete
  4. Dot Net Training Chennai

    Thanks for your wonderful post.It is really very helpful for us and I have gathered some important information from this blog.If anyone wants to get Dot Net Training in Chennai reach FITA, rated as No.1 Dot Net Training Institute in Chennai.

    Dot Net Course in Chennai

    ReplyDelete
  5. Testing Training in Chennai

    Hi, I have read your blog. Your information is really useful for me.Thanks for sharing this blog. I did QTP Course in Chennai at Fita training and placement academy which offer best Selenium Course in Chennai with years of experienced professionals. This is really useful for me to make a bright career.

    ReplyDelete
  6. QTP Training in Chennai

    Hi, I wish to be a regular contributor of your blog. I have read your blog. Your information is really useful for beginner. I did Software Testing Course in Chennai at Fita training and placement academy which offer best Software Testing Training in Chennai with years of experienced professionals. This is really useful for me to make a bright career.

    Regards...

    Software Testing Training Institutes in Chennai

    ReplyDelete
  7. HTML5 Training

    Hi, Thanks for sharing this valuable blog.I was really impressed by reading this blog. I did HTML5 Training in Chennai at reputed HTML5 Training Institutes in Chennai. This is really useful for me to make a bright future in designing field.

    Best HTML5 Training in Chennai

    ReplyDelete
  8. Best Big Data Hadoop Training in Hyderabad @ Kalyan Orienit

    Follow the below links to know more knowledge on Hadoop

    WebSites:
    ================
    http://www.kalyanhadooptraining.com/

    http://www.hyderabadhadooptraining.com/

    http://www.bigdatatraininghyderabad.com/

    Videos:
    ===============
    https://www.youtube.com/watch?v=-_fTzrgzVQc

    https://www.youtube.com/watch?v=Df2Odze87dE

    https://www.youtube.com/watch?v=AOfX-tNkYyo

    https://www.youtube.com/watch?v=Cyo3y0vlZ3c

    https://www.youtube.com/watch?v=jOLSXx6koO4

    https://www.youtube.com/watch?v=09mpbNBAmCo

    ReplyDelete
  9. brillant piece of information, I had come to know about your web-page from my friend hardkik, chennai,i have read atleast 9 posts of yours by now, and let me tell you, your webpage gives the best and the most interesting information. This is just the kind of information that i had been looking for, i'm already your rss reader now and i would regularly watch out for the new posts, once again hats off to you! Thanx a million once again, Regards, obiee training in hyderabad,informatica training in hyderabad


    ReplyDelete
  10. Best SQL Query Tuning Training Center In Chennai This information is impressive; I am inspired with your post writing style & how continuously you describe this topic. After reading your post, thanks for taking the time to discuss this, I feel happy about it and I love learning more about this topic..

    ReplyDelete
  11. A1 Trainings as one of the best training institute in Hyderabad for online trainings for Hadoop. We have expertise and real time professionals working in Hadoop since 7 years. Our training strategy and materials will help the students for the certification exams also.

    Hadoop Training in Hyderabad

    ReplyDelete
  12. Great Blog Thanks for sharing ...... such a helpful information keep sharing these type of blogs.
    Hadoop Training in Hyderabad

    ReplyDelete
  13. I am happy to find this post Very useful for me, as it contains lot of information

    Education
    Technology

    ReplyDelete
  14. I am obliged to you for sharing this piece of information here and updating us with your resourceful guidance. Hope this might benefit many learners. Keep sharing this gainful articles and continue updating us.
    Hadoop Training in Chennai
    Cloud computing Training
    Cloud computing Training institutes in Chennai
    Best Cloud computing Training in Chennai
    Big Data Course in Chennai
    Big Data Hadoop Training in Chennai

    ReplyDelete
  15. Salesforce CRM programming will assist organizations with having a short outline about how their rivals in the comparative specialty are performing and accordingly helping them in fostering an answer that would help in productive administration. Salesforce training with placement in Noida

    ReplyDelete
  16. If You Are Looking For Forex Broker? Read This Review And Find Out How Much I've Enjoyed My Experience With ETRADE LOG ON

    ReplyDelete
  17. Very Informative blog thank you for sharing. Keep sharing.

    Best software training institute in Chennai. Make your career development the best by learning software courses.

    rpa certification in chennai
    php training in chennai
    rpa uipath training in chennai

    ReplyDelete