Der Schmale – David Lenaerts’s blog

Flash Platform Experiments

Dealing with the virality of memory alignment

Tags: , , , , , ,

A C++-related post for a change! I refuse to spend whatever scraps of free time I can find on Flash these days, instead preferring to play around with DirectX 11 and tinkering away on a playground engine dubbed “Helix”. Only recently it came to my attention that there’s already a 3D game engine named “Helix” on Google Code. It hasn’t been updated in 5 years and since I currently have no intention on publishing any of my stuff aside from the occasional snippet, I decided to go ahead and not care about names. Otherwise, there’s always Namingway! Anyway, I digress.. I’m having fun being forced to learn new things, and I hope to share a few things now and then (reminds me of the olden days of this blog!). On to memory alignment issues…

A quick alignment intro. Certain data types have strict memory alignment requirements, especially when using SSE intrinsics. In this particular case, types such as __m128 require objects to be 16-byte aligned. This is automatically the case for variables on the stack on x86 and x64 architectures, but when your variable is a member of a class or allocated from the heap there’s no such guarantee. In MSVC, you can force the compiler to align your variables (or an entire class) correctly with respect to the class layout using the __declspec(align(#)). As long as the container class is aligned, the member will be as well.

Whenever there’s an alignment requirement and the object is in fact a class member, it imposes the same alignment requirement on the container class. See what happens if the container doesn’t share the alignment on the image below. This can easily start affecting large parts of your code-base, even in places that ostensibly have nothing to do with such an alignment. Alignment bugs are irregular, they don’t tell you they’re alignment issues (usually as an seemingly unrelated access violation) and as such they’re very hard to track down. If (no: when) any class up the containment chain is not correctly aligned, you’ll be hunting down strange illegal access errors all day.

When class Container is allocated with a different alignment (at location 0×24), member “aligned” is no longer 16-byte aligned (residing at address 0×40).

__declspec(align(#)) does not affect dynamic allocations, so you’d need to implement a custom allocation scheme to make sure the object is created at the correct location, and this for any class that is thus virally affected. To simplify dynamic allocations, it would be tempting to overload the global new and delete operators to always assure alignment whether it’s necessary or not. However, you may not want to overload the global operators; there’s always 16 byte of wasted padding for every allocated object and overriding global operators may be generally undesirable as it affects unrelated code. Besides, this only handles dynamic allocations; you still need to use __declspec(align(#)) all over the place to assure static alignment.

To reduce mistakes and bug hunts, we wish to restrict the alignment requirements to where they are directly needed. The solution is pretty trivial, but since I couldn’t find any useful articles after a quick Google session, I decided to share my approach. I created a proxy template class simply called “Aligned”. See below: (I’m skipping out on some best practices, etc, feel free to complain about that ;) )

#ifndef __HELIX_ALIGNED__
#define __HELIX_ALIGNED__

namespace helix
{
    /**
     * A wrapper for class properties that have alignment requirements
     * Type: The type of the aligned object
     * alignment: The alignment in bytes (defaults to 16)
     */

    template<class Type, unsigned int alignment = 16>
    class Aligned
    {
    public:
        Aligned();
        Aligned(const Type& source);    // allow construction from non-wrapped objects
        Aligned(const Aligned<Type, alignment>& source);
        ~Aligned();
       
        Aligned& operator=(const Aligned<Type, alignment>& source); // allow assignment of non-wrapped objects
        Aligned& operator=(const Type& source);

        // dereference operator to get to actual object
        inline Type& operator*() { return *object; }
        inline const Type& operator*() const { return *object; }

        // member access operator for base object
        inline Type* operator->() { return object; }
        inline const Type* operator->() const { return object; }

    private:
        // allocate statically to keep class layout coherent, we only need as much padding as the alignment value
        char block[sizeof(Type) + alignment];
        Type* object;

        void* GetAlignedPointer();
    };

    template<class Type, unsigned int alignment>
    Aligned<Type, alignment>::Aligned() :
        object(0)
    {      
        void* ptr = GetAlignedPointer();
        object = new (ptr) Type();     
    }

    template<class Type, unsigned int alignment>
    Aligned<Type, alignment>::Aligned(const Type& source) :
        object(0)
    {
        void* ptr = GetAlignedPointer();
        object = new (ptr) Type(source);
    }

    template<class Type, unsigned int alignment>
    Aligned<Type, alignment>::Aligned(const Aligned<Type, alignment>& source) :
        object(0)
    {      
        void* ptr = GetAlignedPointer();
        object = new (ptr) Type(*source);
    }

    template<class Type, unsigned int alignment>
    Aligned<Type, alignment>::~Aligned()
    {
        object->~Type();
    }

    template<class Type, unsigned int alignment>
    Aligned<Type, alignment>& Aligned<Type, alignment>::operator=(const Aligned<Type, alignment>& source)
    {
        *object = *source;
        return *this;
    }

    template<class Type, unsigned int alignment>
    Aligned<Type, alignment>& Aligned<Type, alignment>::operator=(const Type& source)
    {
        *object = source;
        return *this;
    }

    template<class Type, unsigned int alignment>
    inline void* Aligned<Type, alignment>::GetAlignedPointer()
    {
        // offset to next 16-byte aligned object
        int padding = alignment - (size_t(block) & (alignment - 1));
        return block + padding;
    }
}

#endif

The class is used as follows (not passing the alignment value assumes a default value of 16):

class ContainerClass
{
// ...
    int someObject;
    Aligned<Type, 16> aligned;
};

Access to the aligned object is the same as with pointers through the dereference operator (*aligned) and the member access operator (aligned->member).

The Aligned class creates a memory block to contain the aligned object and some padding. Upon construction, the object checks the block’s address and finds the next correctly aligned address. No matter how it was created, the resulting location will be safe to construct the to-be-aligned object. The distance to the next aligned byte can’t obviously be larger than the alignment itself which is why we use that for padding.

Note that the block of memory is defined statically rather than creating it with new. This assures that stack-based object remain stack-based and that the location of the aligned object is still coherent with the container class’s layout. There will be less chance of a cache miss.

Since every usage of Aligned introduces some padding, it’d still be a waste if it happens more than once for a single container. You can group together objects with the same requirement in a struct, and wrap that in an Aligned proxy:

// ContainerClass is free to use whatever alignment it pleases
class ContainerClass
{
    struct Properties {
    // assuming Type is declared using __declspec(align(16))
        Type obj1;
        Type obj2;
        Type obj3;
    };
    // ...
    Aligned<Properties, 16> props;
};

If you already adhere to the pimpl idiom, things should become easy enough by simply replacing the implementation struct with Aligned<Impl>.

It would make sense to create a similar solution for arrays, so you wouldn’t create an array of Aligned with padding waste per element. This should be a trivial variation.

I hope the post was useful for some. Until next time!

Leave a comment (0 comments)

Speaking at FMX 2013

Tags: , , , , , , , , ,

Albeit with trembling knees, I’m very excited to announce I’ll be speaking at FMX this year, ‘Europe’s most renowned conference on digital entertainment’. It’s an intimidating place to be, with many of my heroes either in the organisation or in the speaker list! Luckily I won’t be alone; the session will be part of the Procedural Animation track, where a couple of dear friends are lined up as well. Curated by good ol’ Frank, the track is focused on rendering and animation outside the realms of gaming and film.

I’ll be presenting an adapted version of A Trick of Light that I tried out on Reasons to be Creative, which deals with an introduction to the world of programming shading and lighting without touching too much on the practical programming side of things. So no code but concepts, intuition and of course a wee bit of maths. This session is not meant for the shader programming veteran, but to provide a starting point for anyone interested in introducing some light to their procedural work.

I hope to see you there, conference passes are ridiculously cheap. I know I’ll be gawking at all the other presentations!

Some other news

While I’m at it, here’s some other updates! I’ve recently joined the motley crew of Psykosoft to work on some upcoming projects. I’m keeping tight-lipped about the details for a while, but maybe some people can already imagine some of the things I’m having fun with.

Alongside that, I’m also working on the Away3D 4.1 beta to implement some optimizations, features and bug fixes. So keep an eye out for that!

As you can guess, busy times!

Leave a comment (0 comments)

Another Take on Skin Rendering

Tags: , , , , ,

It’s been a while since I’ve made a blog post about something I’ve played around with just for the heck of it.  With Away3D 4.1 Alpha pushed out yesterday, I decided to spend the day revisiting an old friend: skin rendering. Remember the blog post of skin rendering when Away3D 4 was still codenamed Broomstick over a year and a half ago? (ugh, time flies!) A lot has changed in the engine, and many new features have been added. It made sense to therefore build a new but similar example showing some newer additions, and highlighting how things were put together. There’s two variations of the demo:

  • With a familiar face: Using the same Lee Perry-Smith model as before, for direct comparison (and because it’s free and I’m a stingy wretch)
  • With a unfamiliar face: Using a free (and insanely detailed) model from TurboSquid because I’m getting a bit sick of the old head (no offence intended to Mr. Perry-Smith) And again, it’s free and I’m a stingy wretch!

As usual, click+drag to move the camera.

There’s source right here. There’s a bunch of boiler plate in there, but the only thing of interest in this case should be the initMaterial method in Main.as. I’ll highlight the material setup here.

Multi-pass materials

The demo is using a multi-pass material for a few reasons. It allows the shadow mapping to more complex, and it prevents the non-casting lights from being masked by the shadow map, allowing for more dramatic lighting effects.

Soft Shadows

With a SoftShadowMapMethod assigned to the material, it’s possible to – as the name suggests – cast soft shadows on the skin. The class was updated in Away3D 4.1 along with some other shadow map methods, allowing more samples to be taken in the shadow map. This way, the results are much smoother which is especially important when tweaking the method’s range property. It sets the distance with which shadow map samples are taken, effectively creating softer shadows (a setting you can tweak in the demos). Of course, the more samples you take, the more demanding your shader will become.

Fresnel Specular Highlights

This is identical to the old demo. It uses FresnelSpecularMethod to achieve the fresnel effect, causing stronger highlights at glancing viewing angles. The fresnelPower can be tweaked to change the viewing angle fall-off: higher values increase the fresnel effect.

Subsurface Scattering Through Gradient Diffuse Lighting

By using GradientDiffuseMethod, you can acchieve a very crude approximation to subsurface scattering. It allows you to pass a gradient image to define the colour and strength of the diffuse reflection for each light/normal-angle. The left of the texture (x = 0) contains the reflection for surfaces pointing away from the light, the right is the reflection when completely facing the light. This allows darker lighting to be made a bit lighter. In this case, by introducing a little red in the mid-values (see “embeds/diffuseGradient.jpg”), we can simulate the addition of scattered light into the final shaded colour. While it doesn’t create true translucency effects similar to SubsurfaceScatteringDiffuseMethod, it does create an organic softness when viewing the surface head-on, an effect that is as subtle as it is effective.

Lighting Set-up

The light set-up is very traditional, a directional light coming from above and the front (by default) with a bright blue point light to the right and a red point light to the left. It’s worth noting however that I’ve set the directional light’s specular property to a much lower value to prevent it from creating strong highlights. Much like a softbox, I’d imagine.

Enjoy tweaking!

Leave a comment (7 comments)

Multi-pass Rendering and Cascaded Shadow Mapping

Tags: , , , , , , ,

If you’ve been paying attention to the Away3D blog, you probably already know that the 4.1 alpha has been released today, which has been my main fixation since September (when I wasn’t getting radiation poisoning). As mentioned in the release post, one of the new features is “multipass shading”. If you’re not into 3D rendering programming, you may be asking yourself: “¿Qué?”, so I would like to go a bit more in-depth into the whats, whys, and hows.

Multi-pass shading

Multipass rendering is simply executing different render calls (“passes”) for a single geometry. Strictly speaking, Away3D 4.0 has supported multiple passes since its inception; SubSurfaceScatteringDiffuseMethod caused a depth map pass to be added, and OutlineMethod introduces a new pass to render the outline seperately from the normal geometry. However, all lighting – and methods contributing to the colour of the final output pixel such as fog, rim lighting, etc – happened in a single pass. With shaders being limited in number of instructions and amount of registers, so are the amount of lights that can affect a surface as well as the complexity of any other piece of code, shadow mapping being a common victim. This is where 4.1 introduces multiple passes (hence multi-pass shading) in the form of TextureMultiPassMaterial and ColorMultiPassMaterial. For all intents and purposes, they work identical to their single-pass counterparts (TextureMaterial and ColorMaterial, respectively), but they automatically split up into different passes depending on the amount of lights and the use of shadow mapping. The results are blended together to form a correctly shaded surface. This way, there’s no strict upper limit for lights. Of course, drawing the geometry multiple times comes with its own cost, and you should take care not to go gung-ho just because you can!

Cascade Shadow Mapping

Having shadows in a separate pass greatly increases what you can do to increase shadow quality. Getting shadows to look great is always a bit of a challenge, and one of the techniques that’s popular is Cascaded Shadow Maps (a great explanation here). If you’re too lazy to read the msdn article, here’s the gist of it. Due to perspective projection, fragments close to the viewer generally suffer shadow aliasing, because a texel in the shadow map covers much more screen space closer to the camera than it does farther away. In fact, for distant fragments, the shadow map resolution is often too large since several shadow map texels map to the same screen fragment. So there’s not enough resolution in the front, and possibly too much in the back! Cascaded Shadow Maps try to solve this by splitting up the view frustum and rendering separate shadow maps for each segment. The closer to the viewer, the smaller the frustum segment should be so as to increase the projected resolution of the shadow map.

CSM in Away3D 4.1

As you’ll see in demo code, it’s pretty easy to get this technique to work. Just be sure to use a multi-pass material and assign CascadeShadowMapper to the light and a CascadeShadowMapMethod to the material (this is identical to how NearDirectionalShadowMapper and NearFieldShadowMapMethod work).

// can pass a value up to 4 in CascadeShadowMapper
cascadeShadowMapper = new CascadeShadowMapper(3);
cascadeShadowMapper.lightOffset = 10000;
directionalLight = new DirectionalLight(-1, -15, 1);
directionalLight.shadowMapper = _cascadeShadowMapper;
scene.addChild(_directionalLight);

multiMaterial = new TextureMultiPassMaterial(texture);
multiMaterial.lightPicker = new StaticLightPicker([light]);
// you can also use HardShadowMapMethod, SoftShadowMapMethod, DitheredShadowMapMethod as base method
baseShadowMethod = new FilteredShadowMapMethod(light);
multiMaterial.shadowMapMethod = new CascadeShadowMapMethod(baseShadowMethod);

Away3D splits up the view frustum automatically, but it’s often a good idea to play around with different split ratios yourself. The values with the best results depend heavily on the scene, how close you allow the camera to get to shaded surfaces, and so on. This is done by simply specifying the ratio in the view frustum for each cascade level to reach, “0″ being the frustum’s near plane, “1″ being the far plane. Generally, you’ll want the last cascade level to reach to the far plane, thus passing “1″.

For example, when using 4 cascade levels, you could specify the splits as follows:

cascadeShadowMapper.setSplitRatio(0, .1);
cascadeShadowMapper.setSplitRatio(1, .2);
cascadeShadowMapper.setSplitRatio(2, .4);
cascadeShadowMapper.setSplitRatio(3, 1.0);

A small detail to note: because texture memory is a precious resource, Away3D internally only uses a single texture for all (up to 4) shadow maps.

The demo and source

The demo that was built to demonstrate the multi-pass materials features the Sponza model built and made available by Crytek. You can find the original at their download site. It’s a bit of a behemoth, so give it some time to load ;) It allows you to play around with some of the shadow map settings, which should be pretty straightforward. At least, they will be when I get around to writing an upcoming blog post about shadow map filter types. Source for the demo can be found on Github.

If you were at my presentation at Reasons to be Creative (what were you doing, missing out on the great weather outside!) you may recognize the set-up. It’s the same as I used for the deferred rendering demo; however, this is not deferred rendering: we merely re-purposed the demo :)

Get the Away3D 4.1 alpha version in the dev branch on Github.

Leave a comment (3 comments)

Away3D 4.1 (dev) Dynamic Reflections

Tags: , , , , , , , ,

R2D2*2One of the features we considered important for the next release of the Away3D engine (4.1) were real-time dynamic reflections, allowing for more realism and precision than the common static environment maps. In the dev branch of the engine, you can now find two flavours: reflections based on dynamic environment maps, and planar reflections.

Dynamic Environment Maps

This technique simply uses cube maps that are rendered to on the fly. Usually, they’re used for any non-planar surfaces, and while they can look convincing enough for complex models they do suffer the same flaws as normal environment maps. Since the scene is rendered for each face of the cube from a single point, the calculated reflections would obviously only be correct for that single point in space, but using it for relatively small and complex objects can look convincing enough. However, since it renders the scene 6 times, it can be slow for more complex situations.

The necessary functionality is exposed through CubeReflectionTexture, a class that can be used wherever a CubeTextureBase is expected: EnvMapMethod and variations, or even as a Skybox texture. I have yet to come up with a use for the latter case, though ;)  To get the best results, it’s usually a good idea to set the CubeReflectionTexture’s position to the centre of your reflective object. The cube map will be generated from this point and on average will yield the best results for all other points.

Check out the demo.
Source in the “dev” examples repository

Planar Reflections

For planar surfaces, a much cheaper approach and one that is very precise can be used. This means normal flat mirrors, polished floors, water, … which are quite common in games can be rendered much more effectively. Since the rules for reflection for the entire surface are the same, we can simply render the scene from a mirrored camera perspective. The only thing we need to make sure is that objects (or parts of objects) behind the mirror aren’t being rendered in this way. Usually, in OpenGL or DirectX, you’d simply introduce a new user-defined clip plane. Flash, however, doesn’t support anything of the sort. Instead, the projection matrix needs to be adapted so that the near plane becomes oblique; aligned with the mirrored plane, effectively clipping any straddling geometry along the mirror. Unfortunately, this also wreaks havoc on the far plane, which will starting cutting geometry that is in the mirrored view due to being at a different angle. Eric Lengyel effectively describes the issue and how he cleverly solves it in his Oblique View Frustum paper. (And while I’m at it, his book is awesome too.)

The texture target is provided as PlanarReflectionTexture. Similar to the cube maps, they need some information about where their respective reflective surfaces are. In this case, it’s the plane property; referencing a Plane3D object. Furthermore, it has a “scale” property that lets you define how much the texture should be scaled down to control quality vs rendering speed. Due to the different math and texture types involved, PlanarReflectionTexture can only be used with specific material methods. Currently, these are PlanarReflectionMethod and FresnelPlanarReflectionMethod. Except for internals and texture type, these function pretty much identical to their EnvMap counterparts.

Check out the demo
Source in the “dev” examples repository.

In closing, this is of course only in the dev branch, which means it’s still subject to change!

Leave a comment (13 comments)

© 2009 Der Schmale – David Lenaerts’s blog. All Rights Reserved.

This blog is powered by Wordpress and Magatheme by Bryan Helmig.