A C++-related post for a change! I refuse to spend whatever scraps of free time I can find on Flash these days, instead preferring to play around with DirectX 11 and tinkering away on a playground engine dubbed “Helix”. Only recently it came to my attention that there’s already a 3D game engine named “Helix” on Google Code. It hasn’t been updated in 5 years and since I currently have no intention on publishing any of my stuff aside from the occasional snippet, I decided to go ahead and not care about names. Otherwise, there’s always Namingway! Anyway, I digress.. I’m having fun being forced to learn new things, and I hope to share a few things now and then (reminds me of the olden days of this blog!). On to memory alignment issues…
A quick alignment intro. Certain data types have strict memory alignment requirements, especially when using SSE intrinsics. In this particular case, types such as __m128 require objects to be 16-byte aligned. This is automatically the case for variables on the stack on x86 and x64 architectures, but when your variable is a member of a class or allocated from the heap there’s no such guarantee. In MSVC, you can force the compiler to align your variables (or an entire class) correctly with respect to the class layout using the __declspec(align(#)). As long as the container class is aligned, the member will be as well.
Whenever there’s an alignment requirement and the object is in fact a class member, it imposes the same alignment requirement on the container class. See what happens if the container doesn’t share the alignment on the image below. This can easily start affecting large parts of your code-base, even in places that ostensibly have nothing to do with such an alignment. Alignment bugs are irregular, they don’t tell you they’re alignment issues (usually as an seemingly unrelated access violation) and as such they’re very hard to track down. If (no: when) any class up the containment chain is not correctly aligned, you’ll be hunting down strange illegal access errors all day.

When class Container is allocated with a different alignment (at location 0×24), member “aligned” is no longer 16-byte aligned (residing at address 0×40).
__declspec(align(#)) does not affect dynamic allocations, so you’d need to implement a custom allocation scheme to make sure the object is created at the correct location, and this for any class that is thus virally affected. To simplify dynamic allocations, it would be tempting to overload the global new and delete operators to always assure alignment whether it’s necessary or not. However, you may not want to overload the global operators; there’s always 16 byte of wasted padding for every allocated object and overriding global operators may be generally undesirable as it affects unrelated code. Besides, this only handles dynamic allocations; you still need to use __declspec(align(#)) all over the place to assure static alignment.
To reduce mistakes and bug hunts, we wish to restrict the alignment requirements to where they are directly needed. The solution is pretty trivial, but since I couldn’t find any useful articles after a quick Google session, I decided to share my approach. I created a proxy template class simply called “Aligned”. See below: (I’m skipping out on some best practices, etc, feel free to complain about that
)
#define __HELIX_ALIGNED__
namespace helix
{
/**
* A wrapper for class properties that have alignment requirements
* Type: The type of the aligned object
* alignment: The alignment in bytes (defaults to 16)
*/
template<class Type, unsigned int alignment = 16>
class Aligned
{
public:
Aligned();
Aligned(const Type& source); // allow construction from non-wrapped objects
Aligned(const Aligned<Type, alignment>& source);
~Aligned();
Aligned& operator=(const Aligned<Type, alignment>& source); // allow assignment of non-wrapped objects
Aligned& operator=(const Type& source);
// dereference operator to get to actual object
inline Type& operator*() { return *object; }
inline const Type& operator*() const { return *object; }
// member access operator for base object
inline Type* operator->() { return object; }
inline const Type* operator->() const { return object; }
private:
// allocate statically to keep class layout coherent, we only need as much padding as the alignment value
char block[sizeof(Type) + alignment];
Type* object;
void* GetAlignedPointer();
};
template<class Type, unsigned int alignment>
Aligned<Type, alignment>::Aligned() :
object(0)
{
void* ptr = GetAlignedPointer();
object = new (ptr) Type();
}
template<class Type, unsigned int alignment>
Aligned<Type, alignment>::Aligned(const Type& source) :
object(0)
{
void* ptr = GetAlignedPointer();
object = new (ptr) Type(source);
}
template<class Type, unsigned int alignment>
Aligned<Type, alignment>::Aligned(const Aligned<Type, alignment>& source) :
object(0)
{
void* ptr = GetAlignedPointer();
object = new (ptr) Type(*source);
}
template<class Type, unsigned int alignment>
Aligned<Type, alignment>::~Aligned()
{
object->~Type();
}
template<class Type, unsigned int alignment>
Aligned<Type, alignment>& Aligned<Type, alignment>::operator=(const Aligned<Type, alignment>& source)
{
*object = *source;
return *this;
}
template<class Type, unsigned int alignment>
Aligned<Type, alignment>& Aligned<Type, alignment>::operator=(const Type& source)
{
*object = source;
return *this;
}
template<class Type, unsigned int alignment>
inline void* Aligned<Type, alignment>::GetAlignedPointer()
{
// offset to next 16-byte aligned object
int padding = alignment - (size_t(block) & (alignment - 1));
return block + padding;
}
}
#endif
The class is used as follows (not passing the alignment value assumes a default value of 16):
{
// ...
int someObject;
Aligned<Type, 16> aligned;
};
Access to the aligned object is the same as with pointers through the dereference operator (*aligned) and the member access operator (aligned->member).
The Aligned class creates a memory block to contain the aligned object and some padding. Upon construction, the object checks the block’s address and finds the next correctly aligned address. No matter how it was created, the resulting location will be safe to construct the to-be-aligned object. The distance to the next aligned byte can’t obviously be larger than the alignment itself which is why we use that for padding.
Note that the block of memory is defined statically rather than creating it with new. This assures that stack-based object remain stack-based and that the location of the aligned object is still coherent with the container class’s layout. There will be less chance of a cache miss.
Since every usage of Aligned introduces some padding, it’d still be a waste if it happens more than once for a single container. You can group together objects with the same requirement in a struct, and wrap that in an Aligned proxy:
class ContainerClass
{
struct Properties {
// assuming Type is declared using __declspec(align(16))
Type obj1;
Type obj2;
Type obj3;
};
// ...
Aligned<Properties, 16> props;
};
If you already adhere to the pimpl idiom, things should become easy enough by simply replacing the implementation struct with Aligned<Impl>.
It would make sense to create a similar solution for arrays, so you wouldn’t create an array of Aligned with padding waste per element. This should be a trivial variation.
I hope the post was useful for some. Until next time!
Leave a comment (0 comments)


One of the features we considered important for the next release of the 


