When first learning CUDA, we usually define and run kernels within the same .cu
file and compile it using the CUDA compiler nvcc
, but what if we want to integrate CUDA to an existing C codebase? In this article, I will show you how to launch CUDA kernels from usual C code.
Starting from the basic "hello world" example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
We move the main()
function to app.c
, leaving only the CUDA kernel in kernel.cu
.
1 2 3 4 5 6 7 8 9 10 11 |
|
1 2 3 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
We also must wrap the kernel launch within a C function to able to call it from normal C code since the <<< >>>
syntax is specific to the CUDA compiler nvcc
and not supported by C compilers.
Finally, we must include <cuda_runtime.h>
in app.c
to access functions from the CUDA runtime API such as cudaDeviceSynchronize()
.
To build the application we must first compile the CUDA kernel without linking using the CUDA compiler:
1 |
|
The -c
flags compiles the kernel.cu
source file into an object file kernel.o
without linking it into an executable.
We then compile the rest of app using a C compiler:
1 2 3 |
|
Linking to CUDA is done via:
-lcuda
: libcuda.so (CUDA driver API)-lcudart
: libcudart.so (CUDA runtime API)The -I
flag adds a directory to the includes path and the -L
flag adds a directory to the shared library (.so) search path.
That's it, you can now enjoy CUDA within your existing C code!