Der Schmale – David Lenaerts’s blog

Flash Platform Experiments

Building efficient content in Away3D 4.0: Sharing Materials & Geometries

Tags: , , , , , , , ,

Time to dig deeper into Away3D 4.0 and see how you can structure and manage your content to get the best performance! If you haven’t read the previous posts, it wouldn’t hurt to do so now, since this one will be all about Materials (on the Mesh side of things) and Geometries.

In any sort of project, if you have data collections that use a lot of memory, it makes sense not to duplicate them if you want to share the same data across several clients (clients simply meaning “users of the data”). Instead, you make all clients refer to the same data object, or model. Similarly, you probably wouldn’t want to perform the same expensive operations on this data for every client. The shared model can take care of that only once for all clients. In the Away3D scene graph structure, the Geometry is essentially your model, and a Mesh refers to it. Many Meshes can make use of the same Geometry. Similarly, Materials can be shared across Meshes as well. This all sounds very logical, yet for some reason I can’t fathom, this is a principle that’s often violated against. Even though Away3D internally tries to minimize memory use and tries to share things as best as it can (it caches Program3D instances, Texture objects, etc., across material instances), I can’t stress this enough: reuse Material and Geometry instances whenever possible. Not only does it limit CPU and GPU memory usage, it’s also essential for performance. I’ll explain why.

Reusing Materials

Every material uses a “shader program”, which is represented in Actionscript by a Program3D object. It’s a combination of a vertex shader and a fragment shader. I won’t go into the subject of shader programs in detail, but if you’re interested, check wikipedia for starters. A large task of the shader program is to define the colour of every pixel that’s being drawn. In other words, it controls how your rendered object will be coloured, textured, lit, shaded, … If it’s on your screen, it passed through a shader program. As a result, a different material means it needs a different Program3D. When drawing objects, the gpu needs to be told to use a different program every time for every material. Unfortunately, this switch is typically very expensive and something you’d want to prevent as much as possible. To this end, Away3D sorts all the objects that need to be rendered (the so called “renderables”) primarily according to their materials so all objects with the same material will be rendered consecutively. If you’re using different material instances for things that look exactly the same, it means more programs need to be switched needlessly.

Sharing materials allows more renderables (here SubGeometry objects) to be rendered with less switches

Sharing materials allows more renderables (here SubGeometry objects) to be rendered with less switches

Okay, I’ll admit, that isn’t entirely true. For those interested, I’ll explain what really happens. Whenever a material is created or changed so that it needs to create a new shader program, the vertex and fragment code is regenerated. A Program3D is requested from a central cache, but if a Program3D object already exists for that exact code combination, the existing object is returned. Different materials using the same code also use the same programs, which means there won’t be any program switching. However, it’s easy enough to accidentally have small differences between materials so that the code – and in turn the program instance – changes. And the following is still true regardless:

Every time a material changes, certain data needs to be uploaded to the GPU: Textures, lighting properties, … No caching of Program3D objects is going to prevent that. If you use a new material instance for every object, these uploads may occur for every object. Share a material, and it only happens once for all the renderables using it.

Note: if you really need several material instances, but you’re using the same BitmapData objects as textures or normal maps, at least try to share those across materials unless you have a very good reason not to. RAM’s not for wasting! Furthermore, there’s a limit to the amount of Texture objects (a buffer object representing the image data on the GPU) that can be created, and for every BitmapData, a Texture is created. You wouldn’t want to go over the limit!

Reusing Geometries

Similarly, Meshes that use the same data should all refer to a single Geometry instance. A very common example is a game in which there are several occurences of the same monster type spread throughout the game map. You can have one “beer demon” Geometry, and place many beer demons in your level by making the different Mesh objects refer to that single Geometry instance. Not only does this use much less memory, it also causes less instances of VertexBuffer3D to be created (these are buffers representing the vertex data on the GPU side of things). There’s a limit on how many can exist at any given time and if you exceed it, an error will be thrown and your game will crash. Sharing Geometry objects can also result in better performance in some cases, because if the same Geometry was used for the previously rendered object, it may not have to reupload its data to the GPU!

Material/Geometry Tandems

In many cases, especially in games, you’ll have a collection of game object “types”, such as different monsters, a barrel, weapon/health/power-up pick-ups. Each of these objects will probably have their own specific Geometry and Material objects. Makes sense, since you typically wouldn’t want to use a barrel geometry for a monster, or a shotgun texture on a health boost pick-up, would you? This one-on-one mapping is a pretty fortunate situation, because it means that the rendering for these objects can be batched very efficiently. Only when different objects will be rendered, does any material or geometry data need to be uploaded, with the exception of scene-graph-specific data such as transformation matrices.

Rendering objects with one-on-one Material/Geometry combinations

Rendering objects with one-on-one Material/Geometry combinations

Making sure that the same Geometry and Material instances are used across all instances for the same game object type is easy. The Mesh class contains a clone() method that creates a new Mesh instance that refers to the same Geometry and Material. This way, you can build a library of reference Meshes that are not actually added to the scene. Instead, you populate your scene with these clone objects. You can dispose of them whenever they’re picked up (health pack) or killed (monster), while the reference mesh and its material and geometry can be disposed when everything is unloaded. For this reference library, you can use Away3D’s AssetLibrary, which will take care of all your loading and management needs :)

For example

Here’s a comparison:

Demo NOT reusing anything (warning, can be very slow!)

Demo reusing both Geometry and Materials

Source for both

I’ll admit, a large cost in the first demo is the creation of the objects rather than their use. This creation is a constant cost, however, and you can see the performance drop drastically over a short time. So… Enough incentive? :-)

 

Update

In the “dev” branch (and when we go Beta it will also be in the master branch), BitmapMaterial was removed in favour of TextureMaterial, accepting a texture rather than a BitmapData. It goes without saying that you should share these texture objects whenever possible rather than creating new textures with the same content.

All sources for this post have been updated to use said dev branch: https://github.com/away3d/away3d-core-fp11/tree/dev

Leave a comment (15 comments)

Some Flash Pixel Bender performance tips + benchmarks

Tags: , , , , ,

flashpbSince I started playing around with Pixel Bender in Flash, I’ve been trying out some different approaches here and there and learned a thing or two on performance optimizations (and quirks). As many people use PB specifically for its performance, and not much has been written on the subject, I thought I’d share my experiences and back them up with some benchmarks. Some of the things here are pretty obvious, yet others can be surprising and even frustrating.

Remember that this concerns Shaders in Flash Player, not Photoshop or After Effects, and that results could change in future versions. All benchmarks were performed on my crummy pc (AMD Athlon 64 X2 Dual, 2.21Ghz, 2GB Ram, Win XP), using 500×500 data with 4 channels, each performing 10 consecutive kernel executions. The kernel itself is just a read, a multiplication, a division, and a sqrt. ShaderJobs are performed synchronously.

Let’s get the obvious out of the way first (I won’t go into common sense optimalizations too much).

Well, duh!

  • Use 4 channels only if necessary. No transparency? Ditch it.
  • Precalculate recurring constant calculations in Flash and pass them as parameters (such as width*height). Sure, it makes the “interface” of your Kernel potentially harder to read, but since Flash doesn’t support dependents (I hope it will some day), this should be a no-brainer if performance is really important.
  • If only a part of a BitmapData needs to be processed, isolate it into a new BitmapData using copyPixels. Even when using applyFilter, sourceRect is buggy.

Told you it was obvious :p Now, some better ones.

Use ShaderJob, not ApplyFilter

  • ShaderJob (on BitmapData) benchmark: 92-99ms
  • ApplyFilter benchmark: 104-109ms
  • ShaderJob ~ 10% faster

BitmapData is faster than ByteArray is faster than Vector.<Number> !

I’ve seen (and been guilty of) a lot of copying BitmapData into a Vector to harness “the power of Vector”. But look at this:

  • ShaderJob on BitmapData: 92-99ms
  • ShaderJob on ByteArray: 147-172ms
  • ShaderJob on Vector.<Number>: 167-192ms
  • BitmapData is ~40% faster than ByteArray
  • BitmapData is ~47% faster than Vector.<Number>!!

Use BitmapData unless you have no other choice, or if complete floating point precision is important.

Conditionals are expensive!

This one annoys me quite a bit. Imagine you’re doing some calculations that you don’t need to do when alpha == 0 (which, as it happens, is usually the case). It can be a good idea to do them anyway in favour of dropping the alpha == 0 check. For the benchmark, I used values that had alpha set to 0 for about half of the data! Compare results to the previous benchmark.

  • BitmapData: 134-192ms – ~47% speed loss!!!
  • ByteArray: 147-172ms – ~22% speed loss
  • Vector: 192-213ms – ~27% speed loss

In practice, test a version with and one without conditional. The results vary heavily depending on how many times calculations are omitted, and how many calculations are otherwise performed. Still, with half the (although slightly trivial) calculations omitted in this case, it’s stupefying that there’s so much increase in execution speed.

Do not use the input as the output

When using a ShaderJob or ApplyFilter, don’t use the same BitmapData/ByteArray/Vector instance that functions as the source. If you need iteration, you’re better of swapping two buffers. What happens is that Flash Player will need to make a temporary copy of the source, which slows things down.

Edit: The results here were compared to the normal ShaderJob test, while they’re using the alpha test. Percentages have been updated

  • BitmapData: 207-218ms – ~30% speed loss
  • ByteArray: 256-271ms – ~65% speed loss
  • Vector: 276-293ms – ~40% speed loss

Update: Asynchronous ShaderJob

I just tested it, and the results indicate that asynchronous calls (waitForCompletion=false) are slower than synchronous calls. I suppose that’s mainly because of the event handling flow. Another thing I tested was to run 2 asynchronous calls with data of half the size, but it seems only 1 asynchronous ShaderJob can be started at the same time.

That’s it, see for yourself!

In closing, I’ll mention something I usually do but doesn’t seem to have any effect (it’s actually a habit from ActionScript). When reading from the same coordinate multiple times, I often store outCoord() in a variable and use that in the sample function. Well, I tested it, and it doesn’t have any impact at all :)

That’s it, at least for now, I hope it’s helpful! Check the benchmark and its source (the source is in fact pretty ugly, but does the trick). I’d be happy to know what kind of results other hardware yields.

Leave a comment (17 comments)

© 2009 Der Schmale – David Lenaerts’s blog. All Rights Reserved.

This blog is powered by Wordpress and Magatheme by Bryan Helmig.