2

I wonder if it is possible to solve a certain problem. In short: get optimal performance by filling the buffer not only line by line but also column by column.

Description below: A graphic buffer is given (i.e. intended to hold a bitmap)

#define WIDTH 320
#define HEIGHT 256

typedef struct
{
  unsigned char r,g,b,a;
}sRGBA;

sRGBA* bufor_1;

main()
{
   bufor_1 = (sRGBA*)malloc(WIDTH*HEIGHT*sizeof(sRGBA));
}

There is no problem with filling it horizontally line by line, because it is a 'cache friendly' case, which is the best one, e.g. floor and ceiling rycasting:

main()
{
   bufor_1 = (sRGB*)malloc(WIDTH*HEIGHT*sizeof(sRGB));

  for (int y = 0; y < HEIGHT; ++y)
  {
      for (int x = 0; x < WIDTH; ++x)
     {
         bufor_1[x+y*WIDTH].r = 100;
      }
  }
}

The difference in performance appears when we want to supplement such a buffer vertically, i.e. column by column, e.g. wall regeneration, which is done in this way, i.e.

main()
{
   bufor_1 = (sRGB*)malloc(WIDTH*HEIGHT*sizeof(sRGB));

  for (int x = 0; x < WIDTH; ++x)
  {
      for (int y = 0; y < HEIGHT; ++y)
     {
         bufor_1[x+y*WIDTH].r = 100;
      }
  }
}

The question that arises is whether it is possible to somehow combine efficient line-by-line and column-by-column completion. From a few tests that I have performed, it turned out that if the buffer is presented as two-dimensional, i.e. column-by-column filling is even faster than line-by-line in a one-dimensional one - but then it is the other way around, i.e. filling such a two-dimensional buffer line by line will be inefficient.

Solutions I was thinking about:

  • rotate the buffer 90 degrees, unfortunately it takes too much time, at least with the algorithms that I checked, unless there is some mega-fast N (1) way
  • some sort of buffer remapping so that some table contains pointers to the next pixels in the column, but it probably won't be 'cache friendly' or even worse - I haven't checked anyway
Mateusz
  • 119
  • 10
  • Please describe what exactly are you trying to draw. Maybe link some image – tstanisl Mar 03 '21 at 18:04
  • I working on my software Racycaster engine, I am not using any 3d API, just drawing into buffer and bliting the buffer to screen. We have 2 steps here. FIRST is the WALLS, walls are drawn column by column in the buffer - unfortunately putting date in buffer in that way is not "cache friendly" and takes more time.. the SECOND step is drawing FLOOR and CEILING, in that case we are drawing line by line so its much faster. The LOOP looks like this. – Mateusz Mar 03 '21 at 18:24
  • What machine is it? X86? Does it support AVX instructions? – tstanisl Mar 03 '21 at 18:31
  • currently working on Intel x86 but my goal is to port it on m68k, so I am looking only for general solutions.. – Mateusz Mar 03 '21 at 19:08
  • To make it really fast you must do the shading of walls whith drawing ceiling and the floor. Is the color of the vertical line of wall constant? How do you select color of walls? – tstanisl Mar 03 '21 at 19:22
  • ohh, sorry I forgot the link with YT video.. it looks like this: https://www.youtube.com/watch?v=yeHRCyc2N-M – Mateusz Mar 03 '21 at 20:00
  • I see, texture sampling complicates things. It is possible to render horizontally if flat shading was used. With textures there would still be cache misses in the texture buffers. What kind of transpose algorithm have you tried. Transposing in tiles (like 4x4 or 8x8 pixels) should be quite cache-friendly. – tstanisl Mar 03 '21 at 21:08
  • No I didn't transpose the buffer yet, I was searching for some algorithms like this below but I am not sure if it will compensate the time when te buffer must be filled by columns. Its taking a bit of time, must do some tests.. do you know maybe that kind of algorithm that will transpose array in place? https://stackoverflow.com/questions/848025/rotating-bitmaps-in-code – Mateusz Mar 03 '21 at 21:39
  • Algorithm in the link looks ok. You do not have to do it in place. It's rather cumbersome if the matrix is not a square. – tstanisl Mar 03 '21 at 21:48
  • @Mateusz are you sure your problems are CACHE related (they are big enough now to hold your 320x200 buffer entirely)? and not by the line filling it self. You know horizontal lines aligned to horizontal buffer is just incrementing the pointer but vertical line must add x-resolution value (if not coded properly for instance using multiplication instead of addition might cause slowdowns) However even bigger performance hit is the perspective correct texture mapping iteration needed for the floor and ceiling which is my bet the real issue so no buffer change will help with that. – Spektre Mar 04 '21 at 09:16
  • @Mateusz see [Ray Casting with different height size](https://stackoverflow.com/a/47251071/2521214) it might interest you. Mine vertical line has this `sy=sy0+((n*dsy)/dn); a=((n<>acc1)+_tx; ty=ty0+((a*dty)>>acc1);` inside iteration loop where horizontal line has just DDA increment of `tx` and `ty` and normal increment of `x` axis instead. The `tx,ty` is texture coordinate. If your routine is coded less optimized that could cause the major speed difference instead of CACHE... – Spektre Mar 04 '21 at 09:25
  • @Spektre Well, yes, I think the buffer filling won't help much.. Without floor and ceiling with only walls I got average 800-1200 fps at 640x512, adding floor and ceiling drops everything significally to 200-300 - problably because the whole 640x512 have to be fill depite the filing is friendly line by line. At know I got about 100-200 fps with 800x600 with is nice result. The one thing I am thinking of is to fill buffer with ceil and floor only in places that floor and ceiling is visible - by now the first step is to fill the whole buffer with ceiling and floor, and then drawing walls – Mateusz Mar 04 '21 at 11:58
  • @Mateusz hmm you fill whole ceiling and floor ? that is weird ... – Spektre Mar 04 '21 at 12:00
  • @Spektre as far I know, there are two main techniques: first will draw floors in the same loop as walls are, column by column, so we only draw floors where they need to be drawn - but this technique is very slow. Second draws floors line by line is its own loop that covers entire buffer, then the second loop draws walls. This is much faster.The bad side is that we calculate and draw unnecessary floors that are hide by walls after. I will try to optimize it. Do you know other technique? This is my second approach and now I am using this by lodev: https://lodev.org/cgtutor/raycasting2.html – Mateusz Mar 04 '21 at 13:47
  • @Spektre ps. I saw your code already some time ago when I was starting writing the my engine, I must read it again :) – Mateusz Mar 04 '21 at 13:49
  • @Mateusz sadly in there are not the floor and ceilings implemented as the code with them did not fit 30K limit of answer – Spektre Mar 04 '21 at 19:09

0 Answers0