Manual Memory Management

Objectives

  • Perform explicit memory allocation and data transfers with cudaMalloc/cudaMemcpy

Instructor note

  • 15 min teaching

  • 0 min exercises

Classical (manual) memory management

Before managed memory was available (or when full control over data movement is desired), memory must be explicitly allocated on the device and data explicitly copied.

double *Y_d, *X_d, *Y, *X;
Y = (double*)malloc(1024 * sizeof(double));
X = (double*)malloc(1024 * sizeof(double));
cudaMalloc((void**)&Y_d, 1024 * sizeof(double));
cudaMalloc((void**)&X_d, 1024 * sizeof(double));

// Copy host → device
cudaMemcpy(X_d, X, 1024 * sizeof(double), cudaMemcpyDefault);

add_kernel<<<4, 256>>>(X_d, Y_d, 1024);
cudaDeviceSynchronize();

// Copy device → host
cudaMemcpy(X, X_d, 1024 * sizeof(double), cudaMemcpyDefault);

cudaFree(Y_d); free(Y);
cudaFree(X_d); free(X);

The last argument of cudaMemcpy specifies the transfer direction:

Direction

Constant

Host → Host

cudaMemcpyHostToHost

Host → Device

cudaMemcpyHostToDevice

Device → Host

cudaMemcpyDeviceToHost

Device → Device

cudaMemcpyDeviceToDevice

Auto-detect (requires UVA)

cudaMemcpyDefault

Advantages of manual management: explicit control over when data moves; cudaMemcpy provides a synchronisation point.

Disadvantage: the programmer must manage all transfers.

Device memory management API

cudaError_t cudaMalloc(void** devPtr, size_t size);
// Allocate size bytes of device memory

cudaError_t cudaFree(void* devPtr);

cudaError_t cudaMemcpy(void* dst, const void* src,
                        size_t count, cudaMemcpyKind kind);
// Copy count bytes from src to dst

cudaError_t cudaMemset(void* devPtr, int value, size_t count);
// Set count bytes to pattern value

Keypoints

  • Manual memory management (cudaMalloc + cudaMemcpy) gives explicit control over data movement