Pinned and Constant Memory¶
Objectives
Know when and how to use pinned (page-locked) and constant memory
Instructor note
15 min teaching
0 min exercises
Page-locked (pinned) memory¶
Regular host memory allocated with malloc may be paged out by the OS. When cudaMemcpy copies data from pageable memory, it must first copy it to a pinned staging buffer, then transfer it over PCIe — this adds overhead.
Pinned (page-locked) memory is guaranteed to stay in physical memory and can be transferred directly, providing higher bandwidth.
Advantages:
Faster copy to/from device (no staging buffer needed)
Can be accessed directly from device code (with UVA, no explicit
cudaHostGetDevicePointerneeded)Required for asynchronous memory copies (covered in the Streams episode)
Caveats:
Be careful with race conditions when accessing mapped pinned memory from both host and device
CPU locks are not visible to the GPU; GPU atomics do not work on pinned host memory
Access over PCIe is slow compared to device memory
CUDA Fortran does not allow pinned memory in kernel arguments
cudaError_t cudaHostAlloc(void** pHost, size_t size, unsigned int flags);
cudaError_t cudaMallocHost(void** ptr, size_t size);
cudaError_t cudaFreeHost(void* ptr);
// flags: cudaHostAllocMapped to map into device address space
real, allocatable, pinned :: q(:)
allocate(q(1024))
Constant memory¶
Constant memory is a special read-only memory space that resides in global memory but uses a dedicated constant cache on each SM (8 KB). It is useful for data that is:
Read by all threads
Never written from device code
Small (maximum 64 KB per device)
Recall that variables passed by value to kernels are automatically stored in constant memory.
__constant__ float constData[256];
__global__ void myKernel() {
float a = constData[0]; // Read from constant memory
}
// Host code: copy to/from constant memory
float data[256];
cudaMemcpyToSymbol(constData, data, sizeof(data), 0, cudaMemcpyDefault);
cudaMemcpyFromSymbol(data, constData, sizeof(data), 0, cudaMemcpyDefault);
module kernels
real :: c_d(100)
attributes(constant) :: c_d
attributes(global) subroutine init()
real :: c2
c2 = c_d(2) ! Read from constant memory
end subroutine
end module
program main
use kernels
real :: c(100)
c_d = c ! Copy host → constant memory
end program
Restrictions:
Cannot be dynamically allocated
Maximum 64 KB per device
Must not be written from device code
In C/C++ host code, constant memory variables are symbols — do not use pointer arithmetic on them
Keypoints
Pinned memory enables faster transfers and is required for asynchronous copies
Constant memory provides cached, read-only access for small data (max 64 KB)