April 2016 – EricPolman.com

I managed to implement a very naive version of Reflective Shadow Maps (an algorithm described in this paper). This post will explain how I did that and what the pitfalls were. It will also cover some possible optimizations.

Figure 1: From left to right: Render without Reflective Shadow Maps, render with reflective shadow maps, difference

The result

In figure 1 you see one of the results produced by RSM. The images you see use the Stanford Bunny and three differently colored quads. In the left image, you see the result of a render without RSM, using just a spot light. Whatever falls in the shadow is completely black. In the middle image you see the same image, but rendered using RSM. Notable differences are the brighter colors everywhere, the pink color bleeding onto the floor and the bunny, the shadow not being completely black. The last image shows the differene between the two images, thus what RSM contributed to the image. You might see some harder edges and artifacts in the middle and righter image, but that can be solved by tweaking the sample kernel size, indirect light intensity, and the amount of samples taken.

The implementation

The engine I implemented this algorithm in has a cross-platform rendering architecture allowing us to create rendering techniques (like deferred shading, shadow mapping, etc.) that will theoretically work on any platform we support. The architecture was set up to be multi-threading compatible and as stateless as possible. It also uses a lot of terminology found in DirectX 11 and 12. The shaders were written in HLSL and the renders made with DirectX 11. Keep this in mind when I talk about implementation details.

I had already set up a deferred renderer with shadow maps for directional lights prior to writing this article. Then I implemented RSM for directional lights. After that, I added spot light shadow maps and added support for RSM to them.

Expanding the shadow map

Traditionally, Shadow Maps (SM) are no more than a depth map. This means you don’t even need a pixel/fragment shader for filling an SM. However, for RSM, you need a few extra buffers. You need to store the world space positions, world space normals, and the flux. This means you need multiple render targets and a pixel/fragment shader to fill them. Keep in mind that you need to cull back faces instead of front faces for this technique. Using front face culling is a commonly used technique to avoid shadow artifacts, but this does not work with RSM.

You pass the world space normal and position to the pixel shader and pass those through to the corresponding buffers. If you have normal mapping, you calculate that in the pixel shader as well. The flux is calculated in the pixel shaders and is the albedo of the material multiplied by the light’s color. For spot lights, you multiply this by the falloff. For directional lights, this will simply look like an unshaded image.

Preparing the shading pass

For the shading pass, you need to do a few things. You need to bind all buffers used in the shadow pass as textures. You also need random numbers. The paper tells you to precalculate those numbers and store them into a buffer in order to save operations for the sampling pass. Since the algorithm is heavy in terms of performance, I thoroughly agree with the paper. They also recommend this to have temporal coherency. This means it will avoid flickering images when every frame uses different shadows.

You need two random floats in the [0, 1] range per sample you take. These random numbers will be used to determine the coordinates of a sample. You will also need the same matrix you use transform world space positions to shadow map texture space positions. Further than that, a non-comparison sampler that clamps with black border colors will also be necessary.

Performing the shading pass

This is the hard part, especially to get it right. I recommend doing the indirect shading pass after you have done the direct shading for a particular light. This is because you need a full screen quad to do this and this works fine for directional lights. However, for spot and point lights you generally want to use shaped meshes with some form of culling to fill less pixels.

I will show a piece of code below that calculates the indirect shading per pixel. After that, I will step through the code and explain what is happening.

[code language=”cpp”]float3 DoReflectiveShadowMapping(float3 P, bool divideByW, float3 N)
{
float4 textureSpacePosition = mul(lightViewProjectionTextureMatrix, float4(P, 1.0));
if (divideByW) textureSpacePosition.xyz /= textureSpacePosition.w;

float3 indirectIllumination = float3(0, 0, 0);
float rMax = rsmRMax;

for (uint i = 0; i < rsmSampleCount; ++i)
{
float2 rnd = rsmSamples[i].xy;

float2 coords = textureSpacePosition.xy + rMax * rnd;

float3 vplPositionWS = g_rsmPositionWsMap.Sample(g_clampedSampler, coords.xy).xyz;
float3 vplNormalWS = g_rsmNormalWsMap.Sample(g_clampedSampler, coords.xy).xyz;
float3 flux = g_rsmFluxMap.Sample(g_clampedSampler, coords.xy).xyz;

float3 result = flux
* ((max(0, dot(vplNormalWS, P – vplPositionWS))
* max(0, dot(N, vplPositionWS – P)))
/ pow(length(P – vplPositionWS), 4));

result *= rnd.x * rnd.x;
indirectIllumination += result;
}
return saturate(indirectIllumination * rsmIntensity);
}[/code]

The first argument in the function is P, which is the world space position for a specific pixel. DivideByW is used for the perspective divide required to get a correct Z value. N is the world space normal at a pixel.

[code language=”cpp”]
float4 textureSpacePosition = mul(lightViewProjectionTextureMatrix, float4(P, 1.0));
if (divideByW) textureSpacePosition.xyz /= textureSpacePosition.w;

float3 indirectIllumination = float3(0, 0, 0);
float rMax = rsmRMax;
[/code]

This section sets up the texture space position, initializes the indirect lighting contribution where samples will accumulate into, and set the rMax variable found in the lighting equation in the paper which I will cover in the next section. Basically, rMax is the maximum distance the random sample can be from the texture space position.

[code language=”cpp”]
for (uint i = 0; i < rsmSampleCount; ++i)
{
float2 rnd = rsmSamples[i].xy;

float2 coords = textureSpacePosition.xy + rMax * rnd;

Here we open the loop and prepare our variables for the equation. In order to optimize it a bit further, the random samples that I calculated are already coordinate offsets, meaning I only have to add rMax * rnd to the texture space coordinates to get my UV coordinates. If the UV coordinates fall outside of the [0,1] range, the samples will be black. Which is logical, since it falls outside of the light’s range, thus does not have any shadow map point to sample from.

[code language=”cpp”]
float3 result = flux
* ((max(0, dot(vplNormalWS, P – vplPositionWS))
* max(0, dot(N, vplPositionWS – P)))
/ pow(length(P – vplPositionWS), 4));

result *= rnd.x * rnd.x;
indirectIllumination += result;
}
return saturate(indirectIllumination * rsmIntensity);
[/code]

This is the part where the indirect lighting equation (displayed in figure 2) is evaluated and weighted by the distance between the point and the pixel light. The equation looks daunting and the code doesn’t really tell you what’s going on either, so I will explain. The variable Φ (phi) is the flux, which is the radiant intensity. The previous article describes this in more detail.

The flux (Φ) is scaled by two dot products. The first dot product is between the pixel light normal and the direction from the pixel light to the surface point. The second dot product is between the surface normal and the direction from the surface point to pixel light. In order to not get inverted light contributions, those dot products are clamped between [0, ∞]. In this equation they do the normalization step last, I assume for performance reasons. It is equally valid to normalize the directions before doing the dot products.

Figure 2: The equation for irradiance at a point in space by pixel light p

The result from this shader pass can be blended on a backbuffer and will give results as seen in figure 1.

Pitfalls

While implementing this algorithm, I ran into some issues. I will cover these issues to avoid people from making the same mistakes.

Incorrect sampler

I spent a considerable amount of time figuring out why my indirect light seemed to repeat itself. Crytek’s Sponza does not have their UV coordinates in the [0,1] range, so we needed a sampler which wrapped. This is however a horrible property when you are sampling from (reflective) shadow maps.

Tweakable values

To improve my workflow, it was vital to have some variables tunable at the touch of a button. I can increase the intensity of the indirect lighting and the sampling range (rMax). For reflective shadow mapping, these variables should be tweakable per light. If you sample in a big range, you get lighting from everywhere, which is useful for big scenes. For more local indirect lighting, you will need a smaller range. Figure 3 shows global and local indirect lighting.

Figure 3: Demonstration of rMax sensitivity.

Separate pass

Initially I thought I could do the indirect lighting in the shader that does the light gathering for deferred rendering. For directional lights, this works, because you render a full screen quad anyway. However, for spot- and pointlights, you try to minimize the fill rate. I decided to move the indirect lighting to a separate pass, something that is necessary if you want to do the screen space interpolation as wel.

Cache inefficient by nature

The algorithm is horribly cache inefficient. The algorithm samples randomly around a point in multiple textures. The amount of samples taken without optimization is unacceptably high as well. With a resolution of 1280 * 720 and a sample count of 400 you take 1.105.920.000 samples per light.

Pros & cons

I will list the pros and cons of this indirect lighting algorithm that I have encountered. I do not have a lot to compare it to, since this is the first that I am implementing.

Pros	Cons
Easy to understand algorithm	Very cache inefficient
Integrates neatly with a deferred renderer	Requires tweaking of variables
Can be used in other algorithms (LPV)	Forced choice between local and global indirect light

Optimizations

I have made some attempts to increase the speed of this algorithm. As they discuss in the paper (link at the top of this page) they perform a screen space interpolation. I got this to work and it sped up the rendering quite a bit. Below I will describe what I have done and make a comparison (in frames per second) between the following states using my 3-walls-with-bunny scene; no RSM, naïve RSM, and interpolated RSM .

Z-check

One reason why my RSM was underperforming was because I was also testing for the pixels that were part of the skybox. A skybox definitely does not need indirect lighting. The speedup this gives depends on how much of the skybox you would actually see.

Pre-calculating random samples on the CPU

Pre-calculating the random samples not only gives you more temporal coherency, it also saves you from having to regenerate those samples in the shaders.

Screen space interpolation

The article proposes to use a low resolution render target for evaluating the indirect lighting. For scenes with a lot of smooth normals and straight walls, lighting information can easily be interpolated between lower resolution points. I am not going to describe this interpolation in detail, to keep this article a bit shorter.

Results and conclusion

Below are my results for a few different sample counts. I have a few observations on these results.

Logically, the FPS stays around 700 for different sample counts when there is no RSM calculation done.
Interpolation brings some overhead and becomes less useful with low sample counts.
Even with 100 samples, the resulting image looked pretty good. This might be due to the interpolation which is “blurring” the indirect light. This makes it look more diffuse.

Sample count	FPS for No RSM	FPS for Naive RSM	FPS for Interpolated RSM
100	~700	152	264
200	~700	89	179
300	~700	62	138
400	~700	44	116

Month: April 2016

Reflective Shadow Maps: Part 2 – The implementation