This might be unlikely, but is there a concept of shared global variables that can be updated in each pass of fragment shaders in either Metal & OpenGLES? I want to update some statistics after processing each pixel. I assume there would be lot of issues of concurrency as each fragment shader runs in parallel.
2 Answers
Something like an atomic_uint? I needed to do something like that recently and that’s what I used. See the answer to my question here: How to implement/use atomic counter in Metal fragment shader?

- 721
- 6
- 22
-
This sounds interesting, but may be an overkill. All I want is to compute histogram of image. To do that, I can check pixel value in fragment shader and then update one of the 256 counters in which bucket the pixel intensity falls in. My only worry is how do the accumulators update when fragment shaders compute in parallel? – Deepak Sharma Feb 19 '18 at 11:24
Atomic performance from fragment shaders is likely to suck to be honest, as you'll get lots of parallel reads and writes from multiple shader cores as the shaders will be very short if you go for the naieve 1 fragment per input texel approach.
The usual implementation for this is to encode the histogram as a framebuffer.
Read from the texture in a vertex shader and place a single point at position which matches the "histogram" coordinate.
Histogram can be accumulated using blend operations.
Read back the histogram on to the CPU using glReadPixels.

- 10,688
- 1
- 20
- 33
-
I know the glBlend operations to accumulate histogram accumulators. My question is how does such blending occurs in parallel without atomicity? You have so many fragment shaders writing in parallel, does it serialize all the writes to frame buffer? – Deepak Sharma Feb 19 '18 at 12:15
-
Another problem using the blend approach is how do you increment beyond 256 as all color depth is 256? – Deepak Sharma Feb 19 '18 at 12:16
-
Yes, all blends must be serialized, but so much atomic increments. There are wider color formats than 8-bit, such as RGB10_A2, which will at least get you up to 1024 values per histogram bin. – solidpixel Feb 19 '18 at 16:28
-
1You could e.g. stripe the framebuffer and do multiple parallel histograms which are merged in a second pass, using a map-reduce type algorithm. – solidpixel Feb 19 '18 at 16:30
-
How do I do that, any sample codes you know of that do parallel histograms on iOS? I am not sure iOS or Android support RGB10_A2. – Deepak Sharma Feb 19 '18 at 16:40
-
But 10 bits per bucket are not enough. I need 32 bits. If parallel histograms are the way to go, please suggest some sample code. – Deepak Sharma Feb 19 '18 at 17:43
-
1Tile the input so you have a max of 1024 texels per tile (so you can saturate a 10-bit histogram), Allocate an output framebuffer so you have 256 columns, and num_tiles rows. Assign points to fragment locations based on tile index (row) and luminance (column). Second pass reads from the intermediate framebuffer and merges the num_tiles rows in to a true UI32 framebuffer. – solidpixel Feb 19 '18 at 21:47
-
1Just two. One to effectively create a list of histograms, one for each 1024 texel tile. One to merge this list of histograms into a single histogram. – solidpixel Feb 21 '18 at 21:59
-
Ok I am trying this approach now. The question is what vertices do I pass to vertex shader? In vertex shader I need to determine the fragment location and do something like gl_Position = frag_location and gl_PointSize = 1.0. If I need to determine frag_location in vertex shader, clearly I need to pass all the coordinates of image as vertices and use GL_Points as primitive. – Deepak Sharma Feb 24 '18 at 07:23