Why Need Memory Alignment
- Nowadays computer processor does not read from and write to memory in byte-sized chunks. Instead, it accesses memory in two-, four-, eight- 16- or even 32-byte chunks granularity. So accessing unaligned memory would give us great overhead.
- All modern processors offer atomic instructions. These special instructions are crucial for synchronizing two or more concurrent tasks. For atomic instructions to perform correctly, the addresses you pass them must be at least four-byte aligned(to avoid memory access across pages). Otherwise, it would cause failure, or worse, silent corruption.
- Some instructions(like some AVX-512 instructions) are designed to have memory alignment requirements, for speed concerns.
- modern compilers sometimes automatically padded the structure for backward compatibility and efficiency concerns.
- cache line: alignment of data may determine whether an operation touches one or two cache lines. Reducing false sharing problem.
C++ in Practice
In most of the cases, C++ itself has already dealt with the memory alignment automatically. But sometimes we need better control of the memory arrangement to achieve better performance. Moreover, as any overzealous C++ programmer would do. We want to understand anything behind the programming language so we can abuse it.
specify alignment requirement for structure(on the stack)
An object, in C, is region of data storage in the execution environment. So every object has size(can be determined with sizeof
) and alignment requirement (can be determined by alignof
(since C11)) attributes. Each basic type has a default alignment, meaning that it will unless otherwise requested by the programmer, be aligned on a pre-determined boundary. The only notable differences in alignment for an LP64 64-bit system when compared to a 32-bit system are:
type | 32-bit | 64-bit |
---|---|---|
long | 4-byte | 8-byte |
double | 8-byte aligned on Windows and 4-byte aligned on Linux (8-byte with -malign-double compile time option) | 8-byte |
long long | 4-byte | 8-byte |
long double | 8-byte aligned with Visual C++, and 4-byte aligned with GCC | 8-byte aligned with Visual C++ and 16-byte aligned with GCC |
pointer | 4-byte | 8-byte |
Although the compiler normally allocates individual data items on aligned boundaries, data structures often have members with different alignment requirements. To maintain proper alignment the translator normally inserts additional unnamed data members so that each member is properly aligned. In addition, the data structure as a whole may be padded with a final unnamed member. This allows each member of an array of structures to be properly aligned.
Padding is only inserted when a structure member is followed by a member with a larger alignment requirement or at the end of the structure. By changing the ordering of members in a structure, it is possible to change the amount of padding required to maintain alignment. For example, if members are sorted by descending alignment requirements a minimal amount of padding is required. The minimal amount of padding required is always less than the largest alignment in the structure. Computing the maximum amount of padding required is more complicated, but is always less than the sum of the alignment requirements for all members minus twice the sum of the alignment requirements for the least aligned half of the structure members.
For example, here is a structure with members of various types, totaling 8 bytes before compilation:
1 | struct MixedData |
After compilation the data structure will be supplemented with padding bytes to ensure a proper alignment for each of its members:
1 | struct MixedData /* After compilation in 64-bit x86 machine */ |
Also, we could use pragma pack(n)
to specify the packing alignment for structure, union, and class members. n
becomes the new packing alignment value. Moreover, we could also use #pragma pack(1)
to not align anything.
alignas
since c++11, we could use alignas
to Specify the alignment requirement of a type or an object. If multiple alignas
are met, the strictest(largest) alignment would be chosen.
1 | struct alignas(16) Bar |
memory alignment for heap memory allocation
The address of a block returned by malloc
or realloc
in GNU systems is always a multiple of eight (or 16 on 64-bit systems). If we need a block whose address is a multiple of a higher power of two than that, use aligned_alloc
or posix_memalign
. (aligned_alloc
and posix_memalign
are declared in stdlib.h
)
if you use a gcc/clang compiler supporting C++17 and above, you can use aligned_alloc
to get spcific alignment.
1 | void *aligned_alloc( size_t alignment, size_t size ); |
And C++17 also have a new feature called aligned new to support allocation alignment:
1 | void* operator new ( std::size_t count, std::align_val_t al ); |
Moreover, in C++17(GCC>=7, clang>5, MSVC>=19.12) the standard allocators have been updated to respect type’s alignment, so containers can allocate appropriate memory meets memory alignment requirement.
1 | class alignas(32) Vec3d { |