Published August 1, 2020｜9 minute read
How do you average (or sum) a lot of numbers, quickly?
More precisely, how do you do this on a webpage, preferably in real-time, when your values are stored as pixels in a large image? In general-purpose applications, we have access to the massive parallelization capabilities of GPUs with CUDA or OpenCL — but on the web, we’re stuck with WebGL and fragment shaders.
One possibility is to write a fragment shader that repeatedly downscales the image by a factor of 2, where each output pixel is the average of the four pixels in its preimage. Assuming each render takes constant time due to GPU parallelization, this technique can average an texture in time.
This works, but has a few disadvantages:
glViewport) is a pain.
An alternative technique that “resolves” some of these issues is to just blur the image repeatedly until it becomes a solid color, then read an arbitrary pixel.
To obtain an accurate average and avoid edge-effects, we treat the texture as a torus. What the blur operation is really doing, then, is circularly convolving the input image with a Gaussian kernel:
As you may have guessed from the above images, however, repeated Gaussian blurring is really bad for this purpose. The issue is that the Gaussian kernel is very local, so information doesn’t spread very fast throughout the image.
Perhaps a kernel that’s more spread out would do better — indeed, there’s no reason for the kernel to even be contiguous!
But first, we need a way to quantify the “quality” of a kernel . Let denote the original image, and let denote the image after convolving with a total of times. Then , and taking the 2D discrete Fourier transform yields .From here, we easily obtain .
The image we want to eventually converge to has all pixels equal to the average value of , so that is the only non-zero coefficient in . Intuitively, the “quality” of the kernel should quantify how quickly the , say in norm. By Parseval’s theorem, it is equivalent to study the convergence of in . To do this, we’ll first need to deduce a bit more about .
Let for denote the image with value 1 at pixel and 0 elsewhere. Then every coefficient in has magnitude 1, by definition.If we assume that the kernel is a convex linear combination of the , then every coefficient in its Fourier transform is a convex linear combination of norm-1 complex numbers — and thus has norm at most 1. Thus all components of either decay exponentially to zero or remain fixed as .
In fact, since all , we have = 1, and so all . Letting denote the kernel with the entry set to zero, we finally define the quality of to be
Put simply, we look at the largest coefficient in by magnitude, excluding . Then the rate of convergence of is on the order of : our average becomes roughly times better on each iteration.
Sidenote: We can now understand precisely why the Gaussian kernel is so bad: the magnitude of the Fourier transform of a Gaussian is a Gaussian, which has a very “flat” peak near . Thus the Fourier coefficients near have magnitudes almost exactly equal to 1.
We now have a clearly defined task: find a kernel of maximal quality .
Clearly, the optimal quality is , when all coefficients of other than are zero. This corresponds to a filter with equal weight on every pixel of the image, so we converge in one step. End of post!
Of course, in practice this would be horrendously slow, even on the GPU. The performance of our fragment shader is related to the number of input pixels we access per output pixel — i.e., the number of non-zero entries of . So, our task is really:
Find a kernel of maximal quality , subject to .
There’s just one issue: this task is horrendously difficult.
This forces us to turn to brute-force Monte-Carlo methods. The procedure I used is as follows:
Keeping track of the maximum-quality kernel during this procedure should give a good estimate of the true solution. Here’s some experimental results for a square image of sidelength :
As expected, the quality increases as the sparsity budget is increased. For a given , the median kernel quality, in the grand scheme of things, isn’t that much worse than that of the best observed candidate.
So perhaps we don’t need to find the absolute best kernel for a given sparsity; using randomly chosen pixel locations yields comparable quality, and we can just increase by a bit if it’s not good enough.
How can we understand this result theoretically? The optimization step (Step 2) is quite difficult to think about, so let’s assume that all randomly chosen pixels end up with equal weights.
Consider an arbitrary coefficient , excluding . This coefficient will be the average of for randomly chosen pairs . Since has unit norm and essentially random phase, its real part has a mean of zero and a variance ofAssuming the are roughly independent and is relatively large, we have by the Central Limit Theorem. The same holds for the imaginary component, so has the Rayleigh distributionwhere .
Since the kernel is real-valued, half of its Fourier coefficients will be redundant, so we may treat as the maximum of independent Rayleigh-distributed random variables. The CDF of each Rayleigh-distributed variable is , so the expected value of the maximum isAssuming and are both fairly large, the integrand can be well-approximated as a step function that transitions from to at the -value satisfying . Solving, we obtain
The median quality should be roughly the inverse of this. With fingers crossed, we compare the theory to experiment:
Not bad, but not great. The actual median is a bit higher, presumably due to the optimization step (Step 2) that we overlooked. Indeed, if we disable the optimization step and weight each kernel entry equally, the results match quite closely:
Thus we can expect our estimate to provide a reliable lower bound for the optimized median. This is actually great news: even randomly choosing pixels and setting equal weights, to achieve a target quality , we need only choose a filter with a size of
which grows only logarithmically with the image size! Thus, even for fairly large images, the kernel remains sparse enough for near-constant-time evaluation on the GPU.
The example image I’ve been using has sidelength . Aiming for a quality of , our formula tells us we need to choose a random kernel with entries. Here’s the results with no kernel optimization (equal weights):
Here’s the results after optimizing the kernel (Step 2 in the procedure described before):
Yep. Even with the unoptimized kernel, after two iterations the average is resolved to a resolution of at least in each channel, which is barely perceptible. For comparison, the Gaussian kernel I used at the beginning of the post had entries.
To achieve the same result with the scaling-based algorithm outlined at the start of the post, you would need to scale down by at each step, so each output pixel would need to look at input pixels. Admittedly, this isn’t a direct comparison, since the scaling method provides an exact solution — but for many applications, a rapidly converging approximate solution does the job.
And finally, for fun, a demo that calculates the value of by averaging an image of a circle! The estimate shown is derived from a single pixel in the displayed image.
Texture Size: 300 × 300
Estimated Quality: Q(K) = 2.094
glGenerateMipmapfunction executes essentially this procedure — and is restricted to power-of-two sidelengths only. WebGL 2 supports non-power-of-two textures, but there’s no guarantee the edge effects will be treated as to give an accurate average...↩