Welcome to this tutorial on CUDA! Before we get started we should setup our development environment.
Install the latest version of the "CUDA Toolkit".
Download and follow the instructions at https://developer.nvidia.com/cuda-downloads to install the toolkit corresponding to your system.
In my case, I am using Fedora so I need to download and install:
Linux / x86_64 / Fedora / 39 / rpm (network)
Reboot your system to make sure that everything is setup and initialized properly.
Add the toolkit to your system PATH
:
export PATH=/usr/local/cuda-12.4/bin${PATH:+:${PATH}}
Once the CUDA toolkit is installed, it is good to verify that everything works as expected. The best way to do that is to build and run the cuda-samples:
1 2 3 |
|
You can then run a sample such as:
1 2 |
|
Now that we know that the CUDA toolkit is installed and working, let's write our first CUDA program.
Create a new file named hello.cu
with the following code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
We can immediately see the resemblance to usual C code however some things are different.
The __global__
attribute signals that the given function should be run on the GPU instead of the CPU, in CUDA parlance this is known as a "CUDA kernel".
To launch our hello
kernel we must use the syntax hello<<<1,1>>>()
.
The meaning of the <<<1,1>>>
will be explained in the next article in details but in this case it specifies that the kernel should run only on a single thread.
Finally we must call cudaDeviceSynchronize()
to ensure that the kernel finishes executing since kernels on the GPU runs asynchronously to the CPU.
Note that we can use printf()
on the GPU (and CPU) with #include <stdio.h>
.
To compile our program we must use the nvcc
compiler nvcc hello.cu -o hello
.
After running our program with ./hello
we should see "hello world!" printed on the screen.
When developping CUDA programs, a bunch of tools can be useful:
nvidia-smi
nvtop
lspci -v | grep VGA
cat /proc/driver/nvidia/version
lspci -n -n -k | grep -A 2 -e VGA -e 3D
nvcc --version
The cuda-samples contains useful tools as well:
./Samples/1_Utilities/deviceQuery
./Samples/1_Utilities/bandwidthTest/
It is also worthwhile to learn to use a CUDA debugger such as the Nsight Visual Studio Code Edition. As a bonus you also get syntax highlighting and intellisense.
For this to work, make sure that the CUDA toolkit is in your system PATH
and to use nvcc -g -G -O0
when compiling kernels.
1 2 3 |
|
This concludes the first article. Our first kernel didn't do much, in fact it didn't take advantage of the GPU at all since it was running on a single thread. In the next one we will see how to write a more useful kernel and wake up that GPU!
The source code for this article is available on GitHub.