When first learning CUDA, we usually define and run kernels within the same .cu file and compile it using the CUDA compiler nvcc, but what if we want to integrate CUDA to an existing C codebase? In this article, I will show you how to launch CUDA kernels from usual C code.
Starting from the basic "hello world" example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | |
We move the main() function to app.c, leaving only the CUDA kernel in kernel.cu.
1 2 3 4 5 6 7 8 9 10 11 | |
1 2 3 | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | |
We also must wrap the kernel launch within a C function to able to call it from normal C code since the <<< >>> syntax is specific to the CUDA compiler nvcc and not supported by C compilers.
Finally, we must include <cuda_runtime.h> in app.c to access functions from the CUDA runtime API such as cudaDeviceSynchronize().
To build the application we must first compile the CUDA kernel without linking using the CUDA compiler:
1 | |
The -c flags compiles the kernel.cu source file into an object file kernel.o without linking it into an executable.
We then compile the rest of app using a C compiler:
1 2 3 | |
Linking to CUDA is done via:
-lcuda : libcuda.so (CUDA driver API)-lcudart : libcudart.so (CUDA runtime API)The -I flag adds a directory to the includes path and the -L flag adds a directory to the shared library (.so) search path.
That's it, you can now enjoy CUDA within your existing C code!