Manual Memory Management¶
Objectives
Perform explicit memory allocation and data transfers with
cudaMalloc/cudaMemcpy
Instructor note
15 min teaching
0 min exercises
Classical (manual) memory management¶
Before managed memory was available (or when full control over data movement is desired), memory must be explicitly allocated on the device and data explicitly copied.
double *Y_d, *X_d, *Y, *X;
Y = (double*)malloc(1024 * sizeof(double));
X = (double*)malloc(1024 * sizeof(double));
cudaMalloc((void**)&Y_d, 1024 * sizeof(double));
cudaMalloc((void**)&X_d, 1024 * sizeof(double));
// Copy host → device
cudaMemcpy(X_d, X, 1024 * sizeof(double), cudaMemcpyDefault);
add_kernel<<<4, 256>>>(X_d, Y_d, 1024);
cudaDeviceSynchronize();
// Copy device → host
cudaMemcpy(X, X_d, 1024 * sizeof(double), cudaMemcpyDefault);
cudaFree(Y_d); free(Y);
cudaFree(X_d); free(X);
real(8), device :: X_d(1024)
real(8), allocatable, device :: Y_d(:)
real(8), allocatable :: Y(:), X(:)
allocate(Y(1024), Y_d(1024), X(1024))
X_d = X ! Host → device (implicit copy)
Y_d = Y
call add_kernel<<<4, 256>>>(X_d, Y_d, 1024)
error = cudaDeviceSynchronize()
X = X_d ! Device → host (implicit copy)
The last argument of cudaMemcpy specifies the transfer direction:
Direction |
Constant |
|---|---|
Host → Host |
|
Host → Device |
|
Device → Host |
|
Device → Device |
|
Auto-detect (requires UVA) |
|
Advantages of manual management: explicit control over when data moves; cudaMemcpy provides a synchronisation point.
Disadvantage: the programmer must manage all transfers.
Device memory management API¶
cudaError_t cudaMalloc(void** devPtr, size_t size);
// Allocate size bytes of device memory
cudaError_t cudaFree(void* devPtr);
cudaError_t cudaMemcpy(void* dst, const void* src,
size_t count, cudaMemcpyKind kind);
// Copy count bytes from src to dst
cudaError_t cudaMemset(void* devPtr, int value, size_t count);
// Set count bytes to pattern value
integer function cudaMalloc(devptr, count)
! devptr: Fortran objects → count in elements
! devptr: TYPE(C_DEVPTR) → count in bytes
integer function cudaFree(devptr)
integer function cudaMemcpy(dst, src, count, kdir)
! dst/src: Fortran objects → count in elements
! dst/src: TYPE(C_DEVPTR) → count in bytes
integer function cudaMemset(devptr, value, count)
Keypoints
Manual memory management (
cudaMalloc+cudaMemcpy) gives explicit control over data movement