Should the model view projection matrix be built in Actionscript 3 or on the GPU in the vertex shader?

Question

All of the Stage3D examples I have seen build the model view projection matrix in AS3 on each render event. eg:

modelMatrix.identity();
// Create model matrix here
modelMatrix.translate/rotate/scale
...
modelViewProjectionMatrix.identity();
modelViewProjectionMatrix.append( modelMatrix );
modelViewProjectionMatrix.append( viewMatrix );
modelViewProjectionMatrix.append( projectionMatrix );
// Model view projection matrix to vertex constant register 0
context3D.setProgramConstantsFromMatrix( Context3DProgramType.VERTEX, 0, modelViewProjectionMatrix, true );
...

And a single line in the vertex shader transforms the vertex into screen space :

m44 op, va0, vc0

Is there a reason for doing it this way? Aren't these kinds of calculation what the GPU was made for?

Why not instead only update the view and projection matrix when they change and upload each to separate registers :

// Projection matrix to vertex constant register 0
// This could be done once on initialization or when the projection matrix changes
context3D.setProgramConstantsFromMatrix(Context3DProgramType.VERTEX, 0, projectionMatrix, true);
// View matrix to vertex constant register 4
context3D.setProgramConstantsFromMatrix(Context3DProgramType.VERTEX, 4, viewMatrix, true);

Then on each frame and for each object :

modelMatrix.identity();
// Create model matrix here
modelMatrix.translate/rotate/scale
...
// Model matrix to vertex constant register 8
context3D.setProgramConstantsFromMatrix(Context3DProgramType.VERTEX, 8, modelMatrix, true);
...

And the shader would instead look like this :

// Perform model view projection transformation and store the results in temporary register 0 (vt0)
// - Multiply vertex position by model matrix (vc8)
m44 vt0 va0 vc8
// - Multiply vertex position by view matrix (vc4)
m44 vt0 vt0 vc4
// - Multiply vertex position by projection matrix (vc0) and write the result to the output register
m44 op vt0 vc0

UPDATE

I have now found another question here which might have already answered this question :
DirectX world view matrix multiplications - GPU or CPU the place

Loading the identity typically means grabbing a fresh copy of the matrix before any transformations have been applied to it and that is the only reason for doing so. For example in my C++ engine I only grab the identity once the rendering loop stars, because all my transformations afterward are relative (I've re-created the flash display list). Basically you should only need to re-load the identity when you're transforming objects that are a direct child of the stage. All of their children and so on can be relatively transformed. — , Apr 09 '12 at 01:31
When you reload the identity it's basically like saying "give me the root matrix as if I never moved anywhere, coordinates starting at 0,0". So if you do translations on a root-level object and then do not reload the identity, you'll be performing the next objects translations as if the scene origin were the x/y/z position of the last object transformed. Note that I'm writing this stuff as comments and not answers because my knowledge applies to openGL and not flash. However, if flash follows "the norm" then this should knowledge should apply in flash too. — , Apr 09 '12 at 01:34
Thanks for the reply. I do understand what the identity does, what I was referring to was creating the final model view projection matrix on the CPU vs the GPU. — cmann, Apr 09 '12 at 07:44

score 1 · Answer 1 · answered May 22 '12 at 10:07

This is a tough optimization problem. The first thing you should ask: Is that really a bottleneck? If yes, you have to consider this:

Doing the matrix multiply in AS3 is slower than it should be.
Extra matrix transforms in the vertex program are practically free.
Setting one matrix is faster than setting multiple matrices as constants!
Do you need the concatenated matrix somewhere else anyway? Picking maybe?

There is no simple answer. For speed I would let the GPU do the work. But in many cases you might want a compromise: Send the model->world and the world->clip matrix like classic OpenGL. For molehill specifically do more work on the GPU in the vertex program. But always make sure that this issue is really a bottleneck before worrying about it too much.

tl/dr: Do it in the vertex program if you can!

score 1 · Answer 2 · answered Jul 19 '12 at 15:31

Don't forget that the vertex shader runs per vertex and you end up doing the multiplication hundreds of thousounds of times per frame,

while the AS3 version only does the multiplication once per frame.

As with every performance problem:

Optimize stuff that runs often and ignore the things that run only now and then.

Should the model view projection matrix be built in Actionscript 3 or on the GPU in the vertex shader?

UPDATE

2 Answers2