Thursday, March 8, 2012

Intro to Cuda C

This is a simple tutorial on creating an application with NVidia's Cuda toolkit. Cuda allows you to write applications that utilize the GPU for processing. It requires hardware that supports a version of Cuda. If your video card supports it, first make sure to have the latest drivers.

Cuda is a C like language that produces code to run on a GPU. You can combine both regualr C and Cuda C code in a single application and go between them easily. You'll need a regular C compiler and/or environment as well as NVidia's Cuda C compiler. This tutorial will use Visual Studio 2008 but the general ideas would work with gcc as well. You will need to download the Cuda toolkit to get the compiler. The 4.1 version is here:
http://developer.nvidia.com/cuda-toolkit-41

I personally had trouble using this version and instead used this one:
http://developer.nvidia.com/cuda-toolkit-32-downloads

The two versions can co-exist without issues. You can also install the SDK to see code samples. Once everything is installed, you can launch Visual Studio and create a new project. Pick a Visual C++ project of type Win 32 Console Application. Call it whatever you like and click OK.


At the next Window, click Application Settings and check Empty Project. Then click Finish.


You now have a blank project. Right click on the Source Files folder in the Solution Explorer and choose Add > New Item. Add a new class and call it main.cu and click OK.

We can now start adding code. The CU file we created will be compiled by the NVidia compiler contained in the toolkit and then can be run. We will see how to set that up later. The NVidia compiler accepts standard C in addition to its own extensions so its easy to learn. For our sample we will write a function to add two integers together and store the result in a third. The method will be designed to run on the GPU using memory allocated on it and called from regular CPU code. It looks like this:


Here we have two integers and a pointer which will store our result to be passed back to the CPU. Note the use of the keyword global which marks our function as an entry point for GPU code. Any function called directly from regular C code must be marked with global.

To allow the GPU to write a value to our int, we need to allocate some memory for the c parameter. This is done with the cudaMalloc function which is similar to malloc. We just need to tell it that we want to allocate space for an int. The following is our initialization code:


Now we can call our add function. A special syntax is used for calling GPU code to pass in the number of blocks and threads we want our function to run on. It takes the form <<<#BLOCKS,#THREADS>>>. This allows us control of parallelism if we wish to take advantage of it. For now we are using only 1 block and 1 thread for the simple calculation. After the function call we copy the result to a our int called answer using the cudaMemcpy function. This same function can be used for copying allocated memory from the CPU to GPU before a method call, such as if we ae passing a filled array to the GPU. The code looks like this:


Now we are ready to display or answer. The final code looks like this:



Now we are ready to test our code. First we need to add the proper build rule. Right click on our project name in the Solution Explorer and choose Custom Build Rules...


From this list we want to pick the version of Cuda we want to run against. I picked the 3.2 Runtime API rule. It might take some experimenting with which version will work for you if you installed multiple toolkits. Once you have picked, click OK.


Now we need to add the proper library to our project. Right click on the Project in the Solution Explorer and choose Properties.


Go to the General item under Linker and add a path by Additional Library Directories. The path to add should be the lib path under the location where you installed the toolkit plus whether your app is 32 bit or 64. For example, the path on my machine is C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\lib\Win32 because I am building a 32 bit app.


Next you need to reference a specific lib. In the Input section under Linker, add cudart.lib to the Additional Dependencies line at the top.


Click OK to exit the properties. You are now ready to run your application. Run without debugging and see the result.