Reconstructing positions from the depth buffer pt. 2: Perspective and orthographic general case

Introduction

It’s been a while since last time, when I promised a general method for depth reconstruction regardless of projection type. I had told myself to do it “soon”. Due to lack of time partly caused by moving to Ghent and other random occurrences “soon” changed into “sooner or later” and finally we’ve arrived at “later”, but here we are!
As a small disclaimer, implementing such a general case only makes sense if you need to support different projection types and can’t provide separate shaders for each. Bespoke implementations are obviously faster, especially in the case of orthographic projections.
For what follows I will continue to use excruciatingly slow step-by-step derivations.

The orthographic case

For completion’s sake, I’ll show position reconstruction for orthographic-only projections. This is considerably easier compared to perspective projections. After all, the value stored in the depth buffer is a value that maps linearly to the near and far plane. As a result, reconstructing the view-space z value is therefore a simple linear interpolation between the two:

    \[ z_{view} = z_{near} + z_{ndc}(z_{far} - z_{near}) \]

Recall that in DirectX, z_{ndc} is simply the depth buffer value, while in OpenGL it’s z_{ndc} = 2 depth - 1. The complete position can be similarly reconstructed:

    \[ P_{view} = P_{near} + z_{ndc}(P_{far} - P_{near}) \]

P_{near} and P_{far} are the positions on the near and far plane for the current pixel being shaded. They are calculated by bilinearly interpolating between the near and far frustum corners. By simply passing the frustum corners from the vertex shader to the fragment shaders as interpolated values, this is handled by the hardware in the same way we interpolated the frustum direction D' in the previous post. The corners themselves are the 8 different combinations of (\pm \frac{projectionWidth}{2}, \pm \frac{projectionHeight}{2}, z_{near or far}).

Reconstructing z: Generalization

Reconstruction the view-space z value for the generalization doesn’t change much compared to the perspective projection-only version, except that we can’t make the same assumptions concerning the projection matrix; M_{34} and M_{44} are not 0 for orthographic projections. So we’re stuck with this:

    \[ z_{view} = -\frac{z_{ndc} M_{44} - M_{43}}{z_{ndc} M_{34} - M_{33}} \]

Review the previous post if you’re hazy on why.

Calculating the position from the z-value: Generalization

In the perspective-only case, we reconstructed the position value assuming the ray origin was 0. This is no longer the case for orthographic projections, as you can see below:

Rendered by QuickLaTeX.com

Evidently this means we’re going to have to use the full ray equation. We’ll have to define a ray origin point which we’ll keep on the XY-plane going through the origin to remain compatible with the origin used in the perspective version. We’ll call this point O_0, with z_{O_0} = 0. For both cases, we define the ray to be:

Rendered by QuickLaTeX.com

Assuming nothing, we need to calculate O_0. The line coinciding with the ray can be expressed as L(t) using a different origin, one whose value we know. Let’s pick the near position P_{near}. O_0 is simply that line’s intersection with the XY plane, found by solving for z = 0:

Rendered by QuickLaTeX.com

Plugging t back into the line equation L(t), we get O_0:

    \[ O_0 = L(-\frac{z_{near}}{z_D}) = P_{near} - \frac{D}{z_D}z_{near} = P_{near} - D'z_{near} \]

Remember \frac{D}{z_D}? We defined this as D', the z-normalized view vector. Armed with a ray origin, we can redo the same math as before. We’re interested in the point on the ray that is the view position, ie: R(t) = P_{view}.

    \[ P_{view} = O_0 + tD \iff t = \frac{z_{view} - z_{O_0}}{z_D} = \frac{z_{view}}{z_D} \]

Using the fact that O_0 = 0. Plugging t back into the ray equation:

    \[ P_{view} = O_0 + tD = O_0 + z_{view}\frac{D}{z_D} = O_0 + z_{view}D' \]

I’ve taken the long way round showing what you probably already figured intuitively: it’s the same as the perspective case, but simply taking into account the origin vector.
Similarly to both the bespoke orthogonal and the perspective cases, O_0 and D' are to be precomputed for each corner of the quad that’s being rendered. The vertex shader can then simply pass them along to the fragment shader so they’re automatically interpolated for the pixel we’re currently operating on.

Calculating the view vectors

All that rests us to do is to calculate the z-normalized view vectors D' for each quad corner. This is simply done by calculating the near and far frustum corners and z-normalizing the difference. The corners in view space are handled entirely the same as we did last time: unprojecting NDC coordinates. The following code shows this, and also performs the ray origin calculations.

// For near corners, you should set z = -1.0f instead of 0.0f for OpenGL
// This time it does matter since we're using the unprojection position for the origin calculation.
Vector3D nearHomogenousCorners[4] = {  
                Vector3D(-1.0f, -1.0f, 0.0f, 1.0f),
                Vector3D(1.0f, -1.0f, 0.0f, 1.0f),
                Vector3D(1.0f, 1.0f, 0.0f, 1.0f),
                Vector3D(-1.0f, 1.0f, 0.0f, 1.0f)
};

Vector3D farHomogenousCorners[4] = {   
                Vector3D(-1.0f, -1.0f, 1.0f, 1.0f),
                Vector3D(1.0f, -1.0f, 1.0f, 1.0f),
                Vector3D(1.0f, 1.0f, 1.0f, 1.0f),
                Vector3D(-1.0f, 1.0f, 1.0f, 1.0f)
};


Matrix3D inverseProjection = projectionMatrix.Inverse();
Vector3D rays[4];
Vector3D origins[4];

for (unsigned int i = 0; i < 4; ++i) { 
    Vector3D& ray = rays[i];

    // unproject the far and near frustum corners from NDC to view space
    Vector3D nearPos = inverseProjection * nearHomogenousCorners[i];
    Vector3D farPos = inverseProjection * farHomogenousCorners[i];
    nearPos /= nearPos.w;
    farPos /= farPos.w;
    ray = farPos - nearPos;

    // z-normalize this vector
    ray /= ray.z;      

    origins[i] = nearPos - ray * nearPos.z;
}

Working in world space

You can simply change to world space reconstruction by transforming both D' and O_0 to world space, and everything will happen by itself:

    \[ P_{world} = O_{world} + z_{view}*D'_{world} \]

Conclusion

One final note about the O_0 calculation above. Working with only 2 projection types, you may consider to check for the projection type and simply pass in more easily calculated values: (0, 0, 0) for perspective and the 4 combinations of (\pm \frac{projectionWidth}{2}, \pm \frac{projectionHeight}{2}, 0) for orthographic projections. That’s up to you. Personally, unless absolutely necessary, I’ll always prefer an elegant calculation to a horrible if-statement checking for types!

So finally, I got this post out. I hope it can be of use to anyone. As always, any questions or suggestions, feel free to drop a line in the comment box.

14 thoughts on “Reconstructing positions from the depth buffer pt. 2: Perspective and orthographic general case

  1. Hi.

    I’ve managed to reach this post as I’ve implemented position reconstruction as described here:

    http://mynameismjp.wordpress.com/2010/09/05/position-from-depth-3/

    However, it obviously doesn’t work correctly for orthographic projections. Given that I’m drawing light volumes instead of fullscreen quads, I don’t have access to the four interpolated frustum corners. Is there a way to adapt the technique you’ve described to work with perspective and orthographic projections if I only really have access to either the current screen (x, y) position, or the eye-space position of the current fragment in the light volume?

  2. Hi ac7,

    Yes, you can still pass in the 4 frustum corners and interpolate manually using projected coordinates (normalized to [0, 1] range) for each vertex in the light volume geometry, then pass it on to the fragment shader that way. For example:

    float4 screenSpace = mvpMatrix * vertex;
    screenSpace /= screenSpace.w; // [-1, 1] NDC range
    screenSpace = screenSpace * .5 + .5; // [0, 1] range
    out.frustumVec = frustumTopLeft + (frustumBottomRight – frustumTopLeft) * screenSpace.xy;

    There’s some obvious optimizations that can be done by moving constant arithmetic out of the shader (fe: frustumBottomRight – frustumTopLeft can be precomputed), but this should give you an idea.

    Hope that helps,
    David

  3. Hi, thanks for the reply.

    I at least have linear eye-space Z reconstruction working correctly
    (and it works correctly regardless of the projection used, which is
    great, thanks!). The article encouraged me to go back and revisit the
    mathematics of projection. On the platform I’m using, I have very
    little in the way of shader debugging capability, so I want to be
    sure I’ve translated what you’ve written in the abstract to concrete
    shader code. Please pardon the possible stupid questions to follow!

    You’ve suggested that I do manual interpolation of the view rays
    (and additionally, I think, the origins too). I already have the
    texture-space coordinates of the screen-space position of the current
    light volume’s vertex. These are already used in my existing shaders
    to sample from the g-buffer, so that’s not a problem.

    However, given that I’m going to be manually interpolating vectors
    rather than letting the vertex shader do it, I’m somewhat confused
    as to exactly what I need to pass in to the fragment shader.

    Unless I misunderstood, you seemed to be suggesting that I only need to
    pass in the top/left and bottom/right vectors of each of the two sets
    of four frustum corners. This makes intuitive sense: I can obviously
    interpolate between these two corners if I have x/y values in the
    range [0.0, 1.0]. If so… Why would the code need to unproject all
    eight corners if only four (two near, two far) are really required?
    Is this just for the purposes of demonstration?

  4. The approach in the article is different from what you need to do because it’s using the quad vertices that match up with the actual frustum rays (and yes, origins too). In other words, they can just be passed on as they are. Your vertices however, do not match the four corners which is why they need to be interpolated manually in the vertex shader. These are then passed on to the fragment shader, and the GPU will further interpolate them for each fragment.
    The 8 corners are unprojected because it’s required if you’re working in world space instead of view space. In your case you’ll also be needing to use bilinear interpolation between 4 rays instead of linear between 2, the example I posted earlier as an example would not work.

  5. Hello again.

    Quick note: Unless I’m mistaken, the function to calculate view rays needs to have the ray Z component flipped if working in a system with a right-handed view space (most OpenGL systems, I’d imagine):

    ray /= -ray.z

    Otherwise the resulting view rays will point towards positive Z (out of the scene rather than into it).

  6. Hi ac7,

    Hm, not really, I’d expect the linear z calculated from depth would already have the correct sign, so when multiplied with the ray (having z == +1), it will point the right way.

    Cheers,
    David

  7. Ah yes, of course… Multiplying a negative linear Z value by a negative-Z pointing ray is going to result in a positive-Z pointing ray.

  8. Hi again. One last one, I think, as I’ve implemented the system you’ve described and it works flawlessly, great work!

    I notice that during the unprojection of the frustum corners, you do this:

    Vector3D nearPos = inverseProjection * nearHomogenousCorners[i];
    ...
    nearPos /= nearPos.w;

    My understanding is that view-space coordinates are transformed to clip-space by multiplication by the projection matrix, and then transformed to normalized-device space by division by their own w components.

    Given that the near/far corners are specified in normalized-device space, I’m not sure I understand how a multiplication by the inverse of the projection matrix followed by a division by w gets one back to view-space from normalized-device space. Obviously if the coordinates were specified as clip-space coordinates, the inverse projection matrix would transform them to view-space, but…

  9. It’s easy to get confused there. It’s probably a good idea to read up on homogeneous coordinates in that case. The thing is that we’re working with 4D coordinates, in which a scalar multiples are equivalent (ie: P === s*P), in other words representing the same point. Projections work with homogeneous coordinates, so unprojecting a point from NDC (with w == 1) and dividing the result by w represents the same point as if we had unprojected the original homogeneous coordinate we projected in the first place to obtain said NDC.

    In other words, if the original projection happened like this:
    p_hom = Proj * p_view
    p_ndc = p_hom / p_hom.w

    p_ndc and p_hom represent the same 3D point in homogeneous space, since they’re a scalar multiple of eachother.

    Then naturally, you would expect the unprojection to be:
    p_hom = p_ndc * p_hom.w; // but at this point, we don’t know p_hom.w
    p_view = Unproj * p_hom

    So to solve this, we simply use an equivalent point we DO know (ie, p_ndc)
    p_view_hom = Unproj * p_ndc // in this case, p_view_hom.w will no longer be 1, so we divide to obtain regular 3D coordinates representing the same point.
    p_view = p_view_hom / p_view_hom.w

    Hope that helps!

  10. Pingback: Unprojections Explained | Der Schmale - David Lenaerts's blog

  11. Thanks for the article. This page helped verify my math Unprojecting and inverted depth buffer. Just wanted to say thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *