CUDA - Events (measuring timings)

In the article on SAXPY we have seen how to measure the kernel execution time using CPU timers, this is fine to get a quick idea of the timings but there is a more accurate way to do this using CUDA events.

API

See the documentation for details.

Usage

CUDA events allow creating GPU timestamp that can be used to get the elapsed time between two points in time.

Events are recorded for a given CUDA stream, the default stream is used when passing 0.

Events do not need to be recreated to take multiple measurements, cudaEventRecord() can be called multiple times on the same event since every call first resets the event.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#include <stdio.h> // printf

int main()
{
    // create events
    cudaStream_t stream = 0; // use default CUDA stream
    cudaEvent_t start;
    cudaEvent_t stop;
    cudaEventCreate(&start);
    cudaEventCreate(&stop);

    // start recording
    cudaEventRecord(start, stream);

    // launch kernel
    kernel<<<num_blocks, threads_per_block>>>(...);

    // stop recording
    cudaEventRecord(stop, stream);
    // wait for events to finish
    cudaEventSynchronize(stop);

    // get elapsed time
    float elapsed_ms = 0.0f;
    cudaEventElapsedTime(&elapsed_ms, start, stop);
    printf("elapsed time: %f ms\n", elapsed_ms);

    // destroy events
    cudaEventDestroy(start);
    cudaEventDestroy(stop);

    return 0;
}

The way I believe this works under the hood is:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// GPU: commands queue
[]

// CPU: cudaEventRecord(start, stream);
[ record timestamp to event "start" ]

// CPU: kernel<<<num_blocks, threads_per_block>>>(...);
// (GPU can start executing operations from this point)
[ record timestamp to event "start" ]
[ launch kernel "kernel" ]

// CPU: cudaEventRecord(stop, stream);
[ record timestamp to event "start" ]
[ launch kernel "kernel" ]
[ record timestamp to event "stop" ]

// GPU: record timestamp to event "start"
[ launch kernel "kernel" ]
[ record timestamp to event "stop" ]

// GPU: launch kernel "kernel"
[ record timestamp to event "stop" ]

// GPU: record timestamp to event "stop"
[]

// CPU: cudaEventSynchronize(stop);

The source code for this article is available on GitHub.