CUDA - Events (measuring timings)
In the article on SAXPY we have seen how to measure the kernel execution time using CPU timers, this is fine to get a quick idea of the timings but there is a more accurate way to do this using CUDA events.
API
cudaEventCreate()
: Create an event.
cudaEventDestroy()
: Destroy an event.
cudaEventRecord()
: Record an event.
cudaEventSynchronize()
: Wait for the event to complete.
cudaEventElapsedTime()
: Get the time between two recorded events.
See the documentation for details.
Usage
CUDA events allow creating GPU timestamp that can be used to get the elapsed time between two points in time.
Events are recorded for a given CUDA stream, the default stream is used when passing 0
.
Events do not need to be recreated to take multiple measurements, cudaEventRecord()
can be called multiple times on the same event since every call first resets the event.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33 | #include <stdio.h> // printf
int main()
{
// create events
cudaStream_t stream = 0; // use default CUDA stream
cudaEvent_t start;
cudaEvent_t stop;
cudaEventCreate(&start);
cudaEventCreate(&stop);
// start recording
cudaEventRecord(start, stream);
// launch kernel
kernel<<<num_blocks, threads_per_block>>>(...);
// stop recording
cudaEventRecord(stop, stream);
// wait for events to finish
cudaEventSynchronize(stop);
// get elapsed time
float elapsed_ms = 0.0f;
cudaEventElapsedTime(&elapsed_ms, start, stop);
printf("elapsed time: %f ms\n", elapsed_ms);
// destroy events
cudaEventDestroy(start);
cudaEventDestroy(stop);
return 0;
}
|
The way I believe this works under the hood is:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27 | // GPU: commands queue
[]
// CPU: cudaEventRecord(start, stream);
[ record timestamp to event "start" ]
// CPU: kernel<<<num_blocks, threads_per_block>>>(...);
// (GPU can start executing operations from this point)
[ record timestamp to event "start" ]
[ launch kernel "kernel" ]
// CPU: cudaEventRecord(stop, stream);
[ record timestamp to event "start" ]
[ launch kernel "kernel" ]
[ record timestamp to event "stop" ]
// GPU: record timestamp to event "start"
[ launch kernel "kernel" ]
[ record timestamp to event "stop" ]
// GPU: launch kernel "kernel"
[ record timestamp to event "stop" ]
// GPU: record timestamp to event "stop"
[]
// CPU: cudaEventSynchronize(stop);
|
The source code for this article is available on GitHub.