I am writing a physic simulation which is like a cellular automata. Each steps dependents on the previous one, but more precisely, each cell needs the state of itself and its direct neighbors to compute its new state. I am using two buffers, alternating roles at each step (multiple reads / single write).
I am using WGSL (WebGPU), and for the moment, for every step (whole grid update, in other word t+1
) I call a dispatch (to ensure synchronization between steps), but it results in quite slow performances. (EDIT: because I was not making use of workgroup properly)
I tried to performs the steps with a loop directly in the shader but I am unable to synchronize all work group between each step. Because I was supicious that the comunication between CPU and GPU was the limiting factor. (SPOILER ALERT: no, it is not)
I tried using storageBarrier
and workgroupBarrier
, which does not work (synchronization does not occur). Nonetheless, if I only use two successive steps with one barrier between them, I increase performance by 2, meaning I am loosing most of the time during dispatch. And the result is almost perfect (meaning some synchronization did not happen but did not affect that much the result).
EDIT: the previous paragraph is a misunderstanding, the result of my test was misleading.
I read that it is impossible to synchronize all work groups in a single dispatch with the current specification of WGSL. But then I don't understand why is there a workgroupBarrier
and a storageBarrier
??
How can I force all work groups to synchronize between each step of cellular automata ?
But more generally, I guess I am not the first person writing a cellular automata on the GPU with this direct neighbor dependency: