I wonder if it is possible to solve a certain problem. In short: get optimal performance by filling the buffer not only line by line but also column by column.
Description below: A graphic buffer is given (i.e. intended to hold a bitmap)
#define WIDTH 320
#define HEIGHT 256
typedef struct
{
unsigned char r,g,b,a;
}sRGBA;
sRGBA* bufor_1;
main()
{
bufor_1 = (sRGBA*)malloc(WIDTH*HEIGHT*sizeof(sRGBA));
}
There is no problem with filling it horizontally line by line, because it is a 'cache friendly' case, which is the best one, e.g. floor and ceiling rycasting:
main()
{
bufor_1 = (sRGB*)malloc(WIDTH*HEIGHT*sizeof(sRGB));
for (int y = 0; y < HEIGHT; ++y)
{
for (int x = 0; x < WIDTH; ++x)
{
bufor_1[x+y*WIDTH].r = 100;
}
}
}
The difference in performance appears when we want to supplement such a buffer vertically, i.e. column by column, e.g. wall regeneration, which is done in this way, i.e.
main()
{
bufor_1 = (sRGB*)malloc(WIDTH*HEIGHT*sizeof(sRGB));
for (int x = 0; x < WIDTH; ++x)
{
for (int y = 0; y < HEIGHT; ++y)
{
bufor_1[x+y*WIDTH].r = 100;
}
}
}
The question that arises is whether it is possible to somehow combine efficient line-by-line and column-by-column completion. From a few tests that I have performed, it turned out that if the buffer is presented as two-dimensional, i.e. column-by-column filling is even faster than line-by-line in a one-dimensional one - but then it is the other way around, i.e. filling such a two-dimensional buffer line by line will be inefficient.
Solutions I was thinking about:
- rotate the buffer 90 degrees, unfortunately it takes too much time, at least with the algorithms that I checked, unless there is some mega-fast N (1) way
- some sort of buffer remapping so that some table contains pointers to the next pixels in the column, but it probably won't be 'cache friendly' or even worse - I haven't checked anyway