Thursday, October 3, 2013

Building Sites Quickly with Node.js and AngularJS

There are increasingly many options for getting sites up and running quickly. This week I thought I'd combine two options that I have worked with previously, Node.js and AngularJS. Using my ec2 server I decided to work out a few demonstrations.

At its simplest, Node.js acts as an HTML module that can serve simple content to a caller and provide consistent connections for messaging or chatting. Through the power of its modules, it can be used to build powerful sites. We'll start by creating a simple one using the express module. Express is a great module that builds on the connect module, giving us features for creating simple or complex sites as well as for creating rest services.

Creating a Simple Page

If it's not installed already, Node.js can be installed using:

sudo yum install npm

Once installed, the first step is to create a directory for your project. Then add a file called package.json to list our dependencies. It can look like this:

 {
  "name": "TestSite",
  "description": "My Test Site",
  "version": "0.0.1",
  "private": true,
  "dependencies": {
   "express": "3.4",
   "jade": "0.35.0"
  }
 }

Once that is created, run:

npm install This will fetch our dependencies. You could also run these separately:
npm install express
npm install jade
Now we are ready to create some simple content. First create a file called index.html. It can be simple and look like this:
<html>
<body>
  <div>
    <h3>Hello from Node.js</h3>
  </div>
</body>
</html>

Now we can create our server side javascript. We create a file called server.js which looks like this:

var express = require("express");
var app = express();
app.use(express.static(__dirname));
app.listen(8888);

Here we tell node to use the express module and we create a new express object. We then tell it to serve static content from our base directory with the __dirname keyword and then listen on port 8888. Now we can start the node server with the following:

node server.js

You will now be able to call your index.html page in the root of your server at port 8888 for example:

http://mytestsite.com:8888/index.html

Adding Asynchronous Calls with AngularJS

Now that we have a simple page hosted, we can add javascript and asynchronous calls to it. To accomplish this we will use AngularJS. AngularJS is an MVVM framework in javascript that makes it easy to bind data and events to your views, fetch data asynchronously, validate your UIs and a lot more. We can get it on our server like this:

wget https://ajax.googleapis.com/ajax/libs/angularjs/1.0.8/angular.min.js

To use it, we will first modify our index.html page. Using extra attributes in our HTML elements we can tell angular what controller we want to hook up to and what events and model items our controls map to:

<html ng-app>
<head>
  <script src="angular.min.js"></script>
  <script src="index.js"></script>
</head>
<body>
  <h3>Hello from Node.js</h3>
  <div ng-controller="TestController">
    <span ng-bind="txtdata"></span><br/>
    <button ng-click="getData()">Get Message</button>
  </div>
</body>
</html>

Here we took our original page and added some mark up. The ng-app label tells it this uses AngularJS while the ng-controller attribute specifies what controller to use. To give dynamic content to our span tag, we use ng-bind. Finally, we use ng-click to link a button to an event in the controller. We will use this event to fill data in to the span tag. Once the modifications are done we can create our javascript that will contain the AngularJS code. Inside index.js we put:

function TestController($scope,$http) {
  $scope.txtdata = "";
  $scope.getData = function() {
    $http.get('/getdata/').success(function(result) {
      $scope.txtdata = result.message;
    });
  }
}

This code defines our controller TestController which matches what we are looking for in our HTML. We define the data we want to bind to, txtdata, and the function getData that our button click binds to. Inside the function we call a url on our root site /getdata which we will add to node js next. The result is then stored in the txtdata variable. To allow for the /getdata call to the server side, we'll modify our server.js file. We'll simulate an object to return but it could just as easily come from a database call:

function MyCustomObject() {
  this.message = "Test messsage";
}

var myObj = new MyCustomObject();
 
app.get('/getdata', function(req, res) {
  res.send(myObj);
});

This code allows us to listen on ourpath/getdata and returns a json object with the message property. Now you can run the node server again and re-test the index.html file. Clicking the button should force a call back to the node server and return the json with the message property.

Using Jade

Jade is a templating engine that express can use to render markup as HTML and is easy to use. To duplicate the index.html file from above, we can create index.jade in our root directory and fill it like this:

doctype 5
html(lang="en" ng-app)
  body
    h3 Hello from Node.js
    #container(ng-controller="TestController")
      span(ng-bind="txtdata")
      br
      button(ng-click="getData()").
        Get Text

    script(src='angular.min.js')
    script(src='index.js')

This markup will create all the HTML tags for us and add the attributes listed in the parentheses. At the end we include our javascript files from earlier though they could also be placed in the header. To hookup the jade rendering so we can display this, we can add the following to our server.js file:

app.set('view engine', 'jade');
app.set('views', __dirname);
app.get('/index', function(req, res) {
  res.render('index.jade');
});

Now if we run this and go to our root path /index it will render a page indentical to index.html with a working angular controller.

So hopefully this shows how quick and powerful Node.js and AngularJS are. For fans of javascript, they are a nice starting point for quickly getting a site up and running. This demonstration can easily be extending by using a modules such as Mongoose or Helenus to connect to databases.

Wednesday, September 25, 2013

Securing Your Django Rest Service with OAuth2

Once you have a rest service that you want others to access, you probably want to secure it. There are many ways this can be done but I thought I'd look in to the OAuth authentication standard. OAuth is a specification specifically for Authorization and provides patterns for distributing access tokens that can be checked to protect server side resources.

For example, in a simple case, you may have one server trying to access a service or resource from another service. First, the calling server would create an account on the server owning the service and obtain its own client id and client secret which would need to be stored securely. Once it has its own secret, it can send its id and secret to the owning service for an authorization check and receive an access token in exchange. That access token can then be used to retrieve data from the server.

In a more complex example, a mobile app may want to pull data from a rest service. A mobile app cannot truly store a client secret and consider it private so extra steps need to be taken. To accomplish this, an account would need to exist on the server owning the service for any user that wants access. The mobile app would first direct the user to a log in page on the remote server owning the service where they either log in or create an account there if one doesn't exist. Once they have an account and enter their credentials, the server will redirect back to a url specified by the app and pass it an authorization code. The app would listen for the redirect and take the access code from the redirect. Now the app can call the server again passing its client id, client secret and access code and get an oauth token back. Now the mobile app can then use this token to make requests to the service. The use of a log in page and redirect url are important as we don't want the mobile app to need to store the username or password for security reasons and we don't want to pass back the authorization code to just anyone. It is usally recommended to somehow register the redirect url with the remote server to prevent third parties from passing random urls that they control.

For more details on these scenarios, refer to the spec here.

For this tutorial we will focus on the simple case of a website or service requesting data from another service. Our consumer service is assumed to be securely storing its client id and secret. As OAuth2 is just a specification, there are multiple implementation libraries out there. For this demonstration we will use the django-oauth2-provider (https://github.com/caffeinehit/django-oauth2-provider). Assuming you have pip, you can install it like this:

sudo pip install django-oauth2-provider

Once we have the provider installed, we want to allow django to use it. Modify settings.py by adding to INSTALLED_APPS:

INSTALLED_APPS = (
   ...
   'provider',
   'provider.oauth2',
   ...
)

Now we want to add the following entry to urls.py:

...
url(r'^oauth2/', include('provider.oauth2.urls', namespace = 'oauth2')),
...

This code sets up a url to route token requests for oauth2. We will use that shortly. First we need to set up a client id and secret that we can use to request a token. For this we will use the admin console and a backend datastore to store our information. In my case, django is configured to use my local MySql database. Make sure your database is running and enter the console like this:

python manage.py shell

Now in the shell you can enter your script to create a new user, associate it with a client and get the clients id and secret:

from provider.oauth2.models import Client
from django.contrib.auth.models import User
user = User.objects.create_user('john', 'john@djangotest.com', 'abcd')
c = Client(user=user, name="mysite client", client_type=1, url="http://djangotest.com")
c.save()

Then you can use c.client_id and c.client_secret to get the information for the new client. In this case, the id is:

'c513118ee3b176805722'
and the secret is:
'd4e5bca2996c8c543349cf0ce140bcd73c86450c'

With this information we can now make calls to the url we set up above to get a token. For example, if your url is http://johnssite.com and django is running on port 80 then you would send a post or get request to:

http://johnssite.com/oauth2/accesstoken

The data to post would look like this:

client_id=c513118ee3b176805722&client_secret=d4e5bca2996c8c543349cf0ce140bcd73c86450c&grant_type=password&username=john&password=abcd&scope=write

This should return a token you can use for your requests.

Now we can fix our service to make use of incoming tokens. The first step is to make sure https is on for your service. This is important to protect the information you are sending to the authorization service. Next we will modify our service to expect authentication. We can add the following code to what we previously had in views.py for our rest service:

import datetime
from provider.oauth2.models import AccessToken
...
if request.method == 'GET':
    restaurants = dbconn['restaurants']
    rests = []

    key = request.META.get('HTTP_AUTHORIZATION')
    if not key:
        return Response({ 'Error': 'No token provided.'})
    else:
        try:
            token = AccessToken.objects.get(token=key)
            if not token:
                return Response({'Error': 'Access denied.'})

            if token.expires < datetime.datetime.now():
                return Response({'Error': 'Token has expired.'})
        except AccessToken.DoesNotExist, e:
            return Response({'Error': 'Token does not exist.'})

     ...

In this code we first take the token that should have been passed in to the authorization header. If you are expecting it as an argument, you can modify the code accordingly. If we have a token, we next check that it exists in the token store that issued it and then check that it didn't expire. If all tests pass, the code flow continues to get the data and return it.

So that's it. Using the django-oauth2-provider library makes it pretty straight forward to create a token authorization service. With only a little extra effort we added basic security to our site.

Thursday, January 10, 2013

Communicating With Hadoop Through RabbitMQ

For those who have a need to start hadoop jobs from other systems, RabbitMQ is an easy to use possibility. RabbitMQ is an open source and freely available messaging system that works with many languages such as Java, Ruby, Python and C#. Its simple to install and connect machines across a network. We'll start by setting up RabbitMQ on the linux machine that will run our Hadoop job. Install guides can be found here: http://www.rabbitmq.com/download.html. For ubuntu Linux, I simply used the following command:

sudo apt-get install rabbitmq-server

You can check that the broker is installed and running using this command:

rabbitmqctl status

The broker's job is to listen for messages coming from this or other machines and route them to the proper queues. If the broker is not running, your machine will not be able to communicate with other queues. Once you have determined the service is running, you can create your listener program. The listener's job is to wait for messages sent to our queue and then start the appropriate job upon receipt. Your listening program can be as fancy as you like, for testing purposes you can just create a basic Java application with a main entry point. Before writing the program you will need to download the libraries from here:

http://www.rabbitmq.com/java-client.html

When creating your program you will need to add references to the jar files: rabbitmq-client.jar, commons-cli-1.1.jar and commons-io-1.2.jar. To listen to a message queue doesn't take much code. First we set up our connection to our local broker:

    ConnectionFactory factory = new ConnectionFactory();
    factory.setHost("localhost");
    Connection connection = factory.newConnection();
    Channel channel = connection.createChannel();
Next we can specify our queue to listen to which will be created if it doesn't already exist.
    //create the queue if it doesn't already exist
    channel.queueDeclare(queueName, false, false, false, null);            
    QueueingConsumer consumer = new QueueingConsumer(channel);
    //listen to the queue
    channel.basicConsume(queueName, true, consumer);
Finally we can retrieve the message like this:
    QueueingConsumer.Delivery delivery = consumer.nextDelivery();
    String message = new String(delivery.getBody());

Once we retrieve the message we can determin a course of action. In this case, we wish to start our job using the tool runner:

ToolRunner.run(new Configuration(), new MyMapReduceJob(), args);

The full client code looks like this:

    try
    {
        String queueName = "TESTQUEUE";
        ConnectionFactory factory = new ConnectionFactory();
        factory.setHost("localhost");
        Connection connection = factory.newConnection();
        Channel channel = connection.createChannel();

        channel.queueDeclare(queueName, false, false, false, null);
            
        QueueingConsumer consumer = new QueueingConsumer(channel);
        channel.basicConsume(queueName, true, consumer);

        while (true) {
            QueueingConsumer.Delivery delivery = consumer.nextDelivery();
            String message = new String(delivery.getBody());
            if (message.equalsIgnoreCase("runjob")) {
                int res = ToolRunner.run(new Configuration(), new MyMapReduceJob(), args);
                break;
            }
        }
    }
    catch (Exception ex) {
        System.out.println(ex.getMessage());
    }
Publishing Messages to Hadoop

Now with the client set up and listening we can create a sender. For this article we'll assume the message is being sent from .NET on a windows server. The first step is to install the RabbitMQ server. Instructions can be found here:

http://www.rabbitmq.com/install-windows.html

Currently this requires setting up Erlang first. Once the server is installed you can download the client library for .NET which can be found here:

http://www.rabbitmq.com/dotnet.html

Once everything is ready you can create a new C# console application to send messages. The code from .NET is similarly only a few lines. We create a connection using the connection factory which should point to the name of the server receiving the messages.

    ConnectionFactory factory = new ConnectionFactory();
    factory.Protocol = Protocols.FromEnvironment();
    factory.HostName = "UbuntuMachine";
    IConnection conn = factory.CreateConnection();

Next we create a model and bind to the queue.

    using (IModel model = conn.CreateModel())
    {
        model.ExchangeDeclare("exch", ExchangeType.Direct);
        model.QueueBind("TESTQUEUE", "exch", "key");
    }

Finally we send our message as a byte array.

    byte[] messageBody = Encoding.UTF8.GetBytes("runjob");
    model.BasicPublish("exch", "key", null, messageBody);

The final completed code looks like this:

    ConnectionFactory factory = new ConnectionFactory();
    factory.Protocol = Protocols.FromEnvironment();
    factory.HostName = "UbuntuMachine";
    using (IConnection conn = factory.CreateConnection())
    {
        using (IModel model = conn.CreateModel())
        {
            model.ExchangeDeclare("exch", ExchangeType.Direct);
            model.QueueBind("TESTQUEUE", "exch", "key");

            Console.WriteLine("Sending message.");
            byte[] messageBody = Encoding.UTF8.GetBytes("runjob");
            model.BasicPublish("exch", "key", null, messageBody);
        }
    }

And that is it. Now we can send messages to Hadoop from different environments using a simple message queue. This can be especially useful for distributed ETL system using tools such as SSIS or Pentaho.

Friday, August 17, 2012

Parallel Programming with the C++ AMP library

Microsoft's C++ AMP library is a group of language extensions that allow your C++ code to take advantage of the mulitple cores on a GPU. The new code works seamlessly with existing code and makes it easy to increase the parallelism of your programs. The library comes with code for features such as arrays, memory transferring and mathematical functions. To use it, you wil first need a copy of visual Studio 2012 which at present can be downloaded for free as a release candidate. The libraries automatically come installed with it.

Once you have visual Studio up and running you can begin. Simply create an empty Win32 console application under the Visual C++ project types. Create a .cpp file to hold your main function. To make use of the code you simply need to include the amp.h header file in your code. We'll start with just a simple loop that will modify an array in parallel.

Microsoft gives you two data structures for modifying data in parallel, array and array_view. Here we will copy elements in to an array_view so we can act on them.

Now we will loop through all three members and add 10 to them.

This code loops three times in parallel, adding 10 to each threads part of the array. Now we can copy the data back to a vector and display the results.

The final code looks like this.

And that's it for this simple intro. There's plenty more to explore though.

Thursday, March 8, 2012

Intro to Cuda C

This is a simple tutorial on creating an application with NVidia's Cuda toolkit. Cuda allows you to write applications that utilize the GPU for processing. It requires hardware that supports a version of Cuda. If your video card supports it, first make sure to have the latest drivers.

Cuda is a C like language that produces code to run on a GPU. You can combine both regualr C and Cuda C code in a single application and go between them easily. You'll need a regular C compiler and/or environment as well as NVidia's Cuda C compiler. This tutorial will use Visual Studio 2008 but the general ideas would work with gcc as well. You will need to download the Cuda toolkit to get the compiler. The 4.1 version is here:
http://developer.nvidia.com/cuda-toolkit-41

I personally had trouble using this version and instead used this one:
http://developer.nvidia.com/cuda-toolkit-32-downloads

The two versions can co-exist without issues. You can also install the SDK to see code samples. Once everything is installed, you can launch Visual Studio and create a new project. Pick a Visual C++ project of type Win 32 Console Application. Call it whatever you like and click OK.


At the next Window, click Application Settings and check Empty Project. Then click Finish.


You now have a blank project. Right click on the Source Files folder in the Solution Explorer and choose Add > New Item. Add a new class and call it main.cu and click OK.

We can now start adding code. The CU file we created will be compiled by the NVidia compiler contained in the toolkit and then can be run. We will see how to set that up later. The NVidia compiler accepts standard C in addition to its own extensions so its easy to learn. For our sample we will write a function to add two integers together and store the result in a third. The method will be designed to run on the GPU using memory allocated on it and called from regular CPU code. It looks like this:


Here we have two integers and a pointer which will store our result to be passed back to the CPU. Note the use of the keyword global which marks our function as an entry point for GPU code. Any function called directly from regular C code must be marked with global.

To allow the GPU to write a value to our int, we need to allocate some memory for the c parameter. This is done with the cudaMalloc function which is similar to malloc. We just need to tell it that we want to allocate space for an int. The following is our initialization code:


Now we can call our add function. A special syntax is used for calling GPU code to pass in the number of blocks and threads we want our function to run on. It takes the form <<<#BLOCKS,#THREADS>>>. This allows us control of parallelism if we wish to take advantage of it. For now we are using only 1 block and 1 thread for the simple calculation. After the function call we copy the result to a our int called answer using the cudaMemcpy function. This same function can be used for copying allocated memory from the CPU to GPU before a method call, such as if we ae passing a filled array to the GPU. The code looks like this:


Now we are ready to display or answer. The final code looks like this:



Now we are ready to test our code. First we need to add the proper build rule. Right click on our project name in the Solution Explorer and choose Custom Build Rules...


From this list we want to pick the version of Cuda we want to run against. I picked the 3.2 Runtime API rule. It might take some experimenting with which version will work for you if you installed multiple toolkits. Once you have picked, click OK.


Now we need to add the proper library to our project. Right click on the Project in the Solution Explorer and choose Properties.


Go to the General item under Linker and add a path by Additional Library Directories. The path to add should be the lib path under the location where you installed the toolkit plus whether your app is 32 bit or 64. For example, the path on my machine is C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\lib\Win32 because I am building a 32 bit app.


Next you need to reference a specific lib. In the Input section under Linker, add cudart.lib to the Additional Dependencies line at the top.


Click OK to exit the properties. You are now ready to run your application. Run without debugging and see the result.

Saturday, January 14, 2012

Writing to MySQL in Hadoop

Continuing from the previous post, I will now create a Reducer that writes the baseball results to a MySQL database. This example is mainly helpful for writing summary data and not for tracking on going data as handling keys is tricky. For that you can write a custom output format that uses its own writer to communicate to the database of your choice. For now we will use a simple reducer to write to a simple table. The table I'm writing to looks like this:


mysql> desc records;
+-------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| team | varchar(100) | YES | | NULL | |
| wins | int(11) | YES | | NULL | |
| loses | int(11) | YES | | NULL | |
+-------+--------------+------+-----+---------+-------+
3 rows in set (0.59 sec)


All we care about is the team name and the wins and losses from the source feed. To start out the example we will show the configuration setup.


DBConfiguration.configureDB(conf, "com.mysql.jdbc.Driver", "jdbc:mysql://127.0.0.1/baseball", "visitor","vispass");
//job set up code here
...

//set up the output format with our table name "records" and our three column names
String [] fields = { "team", "wins", "loses"};
DBOutputFormat.setOutput(job, "records", fields);
//tell the job what reducer to use
job.setReducerClass(ScoreReducer.class);


The configuraton line specifies our local MySQL url and the name of the database called baseball along with the user name and password. It is important that this configuration line be placed before the creation of the Job class or the settings will not be persisted to the reducer and output format class. The reducer class takes out Text team key and list of integer values and turns them in to win and loss totals, placing them in a database record.


public static class ScoreReducer extends Reducer<Text, IntWritable, BaseballDBWritable, Text> {

@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {

int wins = 0;
int losses = 0;

//get iterator so we can loop through wins and loses
Iterator<IntWritable> vals = values.iterator();

int sum = 0;
while(vals.hasNext()) {
int n = vals.next().get();

if (n == 0) //0 indicates a loss
losses++;
else //1 indicates a win
wins++;

}
//create our record object to persist the team's wins and losses
BaseballDBWritable record = new BaseballDBWritable(key.toString(), wins, losses);
context.write(record, key);
}

}


To write out to a database, our key class needs to implement the DBWritable class. In this case I've created a custom one called BaseballDBWritable which holds the fields for each record. The ordering of the database record as the key is necessary due to the way Hadoop passes the information off to the output. The value class here is unimportant and not used. The custom writable class looks like this:


public static class BaseballDBWritable implements Writable, DBWritable {

String team;
int wins;
int losses;

public BaseballDBWritable(String team, int wins, int losses) {
this.team = team;
this.wins = wins;
this.losses = losses;
}

@Override
public void readFields(ResultSet arg0) throws SQLException {
}

@Override
public void write(PreparedStatement arg0) throws SQLException {
arg0.setString(1, team);
arg0.setInt(2, wins);
arg0.setInt(3, losses);
}

@Override
public void readFields(DataInput arg0) throws IOException {
}

@Override
public void write(DataOutput arg0) throws IOException {
}
}


All we do here is add our values to the prepared statement created behind the scenes. The default database output format class will handle creating and executing the surrounding sql. Once all this is done your job is ready to go. Assuming a blank table from the start and a number of rows in our source database, the output records should look like this:


mysql> select * from records;
+-----------+------+-------+
| team | wins | loses |
+-----------+------+-------+
| Astros | 1 | 0 |
| Athletics | 2 | 0 |
| Dodgers | 0 | 1 |
| Giants | 1 | 0 |
| Marlins | 1 | 0 |
| Mets | 0 | 1 |
| Padres | 0 | 1 |
| Phillies | 1 | 0 |
| Rays | 0 | 1 |
| Red Sox | 0 | 1 |
| Reds | 0 | 1 |
| Yankees | 1 | 1 |
+-----------+------+-------+
12 rows in set (0.03 sec)

Thursday, January 12, 2012

Reading from HBase in Hadoop

A useful application of a MapReduce job could be to move data from a large data store and summarize it in a smaller store. I decided to try moving data from HBase in to another store. Using the latest library it is pretty easy. I started with a simple table called baseballscores. The first row with id 'game1' looks like this:

hbase(main):001:0> get 'baseballscores', 'game1'
COLUMN CELL
loser:score timestamp=1325913333740, value=2
loser:team timestamp=1325913325984, value=Rays
winner:score timestamp=1325913306939, value=5
winner:team timestamp=1325913295557, value=Athletics


The table has two column families and two columns per family. To read from this table, we first we need to define our configuration. We can use the TableMapReduceUtil class to help set up the mapper job. We need to pass it the name of our table, the mapper class name, the mapper output key class type and the output value class from the mapper.


//create our configuration
Configuration conf = HBaseConfiguration.create();
Job job = Job.getInstance(conf, "HBaseReader");
job.setJarByClass(HBaseMapReduce.class);
Scan scan = new Scan();

//set up mapper job to use baseballscores table
TableMapReduceUtil.initTableMapperJob("baseballscores", scan, HBaseMapper.class, Text.class, Text.class, job);


Now we can create out mapper class to take the data from the HBase table. The values collection contains our column data. To get a specific column, we can call Values.getValue and pass in the name of the column family and then the column name. Here we extract the winning and losing team names and then associate them with a win or loss key so we can total them later.

   static class HBaseMapper extends TableMapper<Text, Text> {

@Override
public void map(ImmutableBytesWritable row, Result values, Context context) throws IOException {

Text winningTeam = new Text(Bytes.toString(values.getValue(Bytes.toBytes("winner"), Bytes.toBytes("team"))));
Text losingTeam = new Text(Bytes.toString(values.getValue(Bytes.toBytes("loser"), Bytes.toBytes("team"))));

try {
context.write("win", winningTeam);
context.write("loss", losingTeam);
} catch (InterruptedException e) {
throw new IOException(e);
}
}
}



And that's it. You can write the reducer of your choice and put the data anywhere you like.