Optimizing a simple 2D Tile engine (+potential bugfix)

Question

Preface

Yes, there is plenty to cover here... but I'll do my best to keep this as well-organized, informative and straight-to-the-point as I possibly can!

Using the HGE library in C++, I have created a simple tile engine.
And thus far, I have implemented the following designs:

A CTile class, representing a single tile within a CTileLayer, containing row/column information as well as an HGE::hgeQuad (which stores vertex, color and texture information, see here for details).
A CTileLayer class, representing a two-dimensional 'plane' of tiles (which are stored as a one-dimensional array of CTile objects), containing the # of rows/columns, X/Y world-coordinate information, tile pixel width/height information, and the layer's overall width/height in pixels.

A CTileLayer is responsible for rendering any tiles which are either fully or partially visible within the boundaries of a virtual camera 'viewport', and to avoid doing so for any tiles which are outside of this visible range. Upon creation, it pre-calculates all information to be stored within each CTile object, so the core of engine has more room to breathe and can focus strictly on the render loop. Of course, it also handles proper deallocation of each contained tile.

Issues

The problem I am now facing essentially boils down to the following architectural/optimization issues:

In my render loop, even though I am not rendering any tiles which are outside of visible range, I am still looping through all of the tiles, which seems to have a major performance impact for larger tilemaps (i.e., any thing above 100x100 rows/columns @ 64x64 tile dimensions still drops the framerate down by 50% or more).
Eventually, I intend to create a fancy tilemap editor to coincide with this engine.
However, since I am storing all two-dimensional information inside one or more 1D arrays, I don't have any idea how possible it would be to implement some sort of rectangular-select & copy/paste feature, without some MAJOR performance hit -- involving looping through every tile twice per frame. And yet if I used 2D arrays, there would be a slightly less but more universal FPS drop!

Bug

As stated before... In my render code for a CTileLayer object, I have optimized which tiles are to be drawn based upon whether or not they are within viewing range. This works great, and for larger maps I noticed only a 3-8 FPS drop (compared to a 100+ FPS drop without this optimization).

But I think I'm calculating this range incorrectly, because after scrolling halfway through the map you can start to see a gap (on the topmost & leftmost sides) where tiles aren't being rendered, as if the clipping range is increasing faster than the camera can move (even though they both move at the same speed).

This gap gradually increases in size the further along into the X & Y axis you go, eventually eating up nearly half of the top & left sides of the screen on a large map. My render code for this is shown below...

Code

//
// [Allocate]
// For pre-calculating tile information
// - Rows/Columns   = Map Dimensions (in tiles)
// - Width/Height   = Tile Dimensions (in pixels)
//
void CTileLayer::Allocate(UINT numColumns, UINT numRows, float tileWidth, float tileHeight)
{
    m_nColumns = numColumns;
    m_nRows = numRows;

    float x, y;
    UINT column = 0, row = 0;
    const ULONG nTiles = m_nColumns * m_nRows;
    hgeQuad quad;

    m_tileWidth = tileWidth;
    m_tileHeight = tileHeight;
    m_layerWidth = m_tileWidth * m_nColumns;
    m_layerHeight = m_tileHeight * m_nRows;

    if(m_tiles != NULL) Free();
    m_tiles = new CTile[nTiles];

    for(ULONG l = 0; l < nTiles; l++)
    {
        m_tiles[l] = CTile();
        m_tiles[l].column = column;
        m_tiles[l].row = row;
        x = (float(column) * m_tileWidth) + m_offsetX;
        y = (float(row) * m_tileHeight) + m_offsetY;

        quad.blend = BLEND_ALPHAADD | BLEND_COLORMUL | BLEND_ZWRITE;
        quad.tex = HTEXTURE(nullptr); //Replaced for the sake of brevity (in the engine's code, I used a globally allocated texture array and did some random tile generation here)

        for(UINT i = 0; i < 4; i++)
        {
            quad.v[i].z = 0.5f;
            quad.v[i].col = 0xFF7F7F7F;
        }
        quad.v[0].x = x;
        quad.v[0].y = y;
        quad.v[0].tx = 0;
        quad.v[0].ty = 0;
        quad.v[1].x = x + m_tileWidth;
        quad.v[1].y = y;
        quad.v[1].tx = 1.0;
        quad.v[1].ty = 0;
        quad.v[2].x = x + m_tileWidth;
        quad.v[2].y = y + m_tileHeight;
        quad.v[2].tx = 1.0;
        quad.v[2].ty = 1.0;
        quad.v[3].x = x;
        quad.v[3].y = y + m_tileHeight;
        quad.v[3].tx = 0;
        quad.v[3].ty = 1.0;
        memcpy(&m_tiles[l].quad, &quad, sizeof(hgeQuad));

        if(++column > m_nColumns - 1) {
            column = 0;
            row++;
        }
    }
}

//
// [Render]
// For drawing the entire tile layer
// - X/Y          = world position
// - Top/Left     = screen 'clipping' position
// - Width/Height = screen 'clipping' dimensions
//
bool CTileLayer::Render(HGE* hge, float cameraX, float cameraY, float cameraTop, float cameraLeft, float cameraWidth, float cameraHeight)
{
    // Calculate the current number of tiles
    const ULONG nTiles = m_nColumns * m_nRows;

    // Calculate min & max X/Y world pixel coordinates
    const float scalarX = cameraX / m_layerWidth;  // This is how far (from 0 to 1, in world coordinates) along the X-axis we are within the layer
    const float scalarY = cameraY / m_layerHeight; // This is how far (from 0 to 1, in world coordinates) along the Y-axis we are within the layer
    const float minX = cameraTop + (scalarX * float(m_nColumns) - m_tileWidth); // Leftmost pixel coordinate within the world
    const float minY = cameraLeft + (scalarY * float(m_nRows) - m_tileHeight);  // Topmost pixel coordinate within the world
    const float maxX = minX + cameraWidth + m_tileWidth;                        // Rightmost pixel coordinate within the world
    const float maxY = minY + cameraHeight + m_tileHeight;                      // Bottommost pixel coordinate within the world

    // Loop through all tiles in the map
    for(ULONG l = 0; l < nTiles; l++)
    {
        CTile tile = m_tiles[l];
        // Calculate this tile's X/Y world pixel coordinates
        float tileX = (float(tile.column) * m_tileWidth) - cameraX;
        float tileY = (float(tile.row) * m_tileHeight) - cameraY;

        // Check if this tile is within the boundaries of the current camera view
        if(tileX > minX && tileY > minY && tileX < maxX && tileY < maxY) {
            // It is, so draw it!
            hge->Gfx_RenderQuad(&tile.quad, -cameraX, -cameraY);
        }
    }

    return false;
}

//
// [Free]
// Gee, I wonder what this does? lol...
//
void CTileLayer::Free()
{
    delete [] m_tiles;
    m_tiles = NULL;
}

Questions

What can be done to fix those architectural/optimization issues, without greatly impacting any other rendering optimizations?
Why is that bug occurring? How can it be fixed?

Thank you for your time!

It's a nicely composed question, but could really do with being separate questions. The architectural issues are separate from the bug and could even be separated from each other. — Joseph Mansfield, Apr 18 '13 at 09:59
@sftrabbit I thought about that, and then realized that they are in fact tied together if I had no other choice but to completely change the way I am optimizing the render code. Any big architectural change could potentially 'fix' the bug, so to speak. — RectangleEquals, Apr 18 '13 at 10:08

score 1 · Accepted Answer · answered Apr 18 '13 at 10:26

1

Optimising the iterating of the map is fairly straight forward.

Given a visible rect in world coordinates (left, top, right, bottom) it's fairly trivial to work out the tile positions, simply by dividing by the tile size.

Once you have those tile coordinates (tl, tt, tr, tb) you can very easily calculate the first visible tile in your 1D array. (The way you calculate any tile index from a 2D coordinate is (y*width)+x - remember to make sure the input coordinate is valid first though.) You then just have a double for loop to iterate the visible tiles:

int visiblewidth = tr - tl + 1;
int visibleheight = tb - tt + 1;

for( int rowidx = ( tt * layerwidth ) + tl; visibleheight--; rowidx += layerwidth )
{
    for( int tileidx = rowidx, cx = visiblewidth; cx--; tileidx++ )
    {
        // render m_Tiles[ tileidx ]...
    }
}

You can use a similar system for selecting a block of tiles. Just store the selection coordinates and calculate the actual tiles in exactly the same way.

As for your bug, why do you have x, y, left, right, width, height for the camera? Just store camera position (x,y) and calculate the visible rect from the dimensions of your screen/viewport along with any zoom factor you have defined.

answered Apr 18 '13 at 10:26

Clever, I'll have to test this asap! BTW, the parameters for `Render` are exactly that; X/Y (_world-space_) offsets, then pixel (_screen-space_) coords for the top/left/width/height. There is no zoom feature at this time. – RectangleEquals Apr 18 '13 at 10:44
Something's not right, here... In your example, which coordinate space(s) belong to which variables (i.e., world-space vs. screen-space)? When I plug in my camera variables, I get stuck in the inner-loop because `visiblewidth` starts off as 1920 (which is the width of my screen, in pixels), until an out-of-bounds exception occurs, since the inner-loop needs to execute 1920 times when there's nowhere near that many tiles to render... – RectangleEquals Apr 18 '13 at 11:27
visiblewidth and visibleheight is the number of visible tiles. (As calculated from the computed tile coordinates tr, tl, tb, tt.) – Apr 18 '13 at 13:51
You will definitely have to make sure your camera doesn't scroll off the edge of the map, or if you want that behaviour, you'll have to clip tl, tt, tr, and tb before entering the render loop. – Apr 18 '13 at 13:54
Okay so I've tried to implement this in every way I that can think of, but I still can't get it to work. I've locked the camera view to always be within bounds of the map, as well as bumped up the size of the map to be larger than the view, and I'm still getting out-of-bounds exceptions. My camera is just an X/Y pixel offset, and I'm adding the width/height of the screen to get the bottom and right sides. In your example, I've even tried changing `layerWidth` to `m_tileWidth` (which would seem more feasible) and yet it still crashes. What am I doing wrong? – RectangleEquals Apr 19 '13 at 05:02
Here's my most recent implementation, tested on a 20x20 map with 64x64 pixel dimensions, and 1920x1200 screen dimensions: http://pastebin.com/55jUQd7z – RectangleEquals Apr 19 '13 at 05:08
You're not converting camera coordinates to tile coordinates, you're merely copying them. If your tiles are 64 pixels wide/high, you need to divide your camera coordinates by 64 to get the tile index. tl = std::max( 0, camx / 64 ); tb = std::min(( camy + screenheight ) / 64, mapheight - 1 ); etc... – Apr 19 '13 at 08:23
Also, is m_TileWidth the width of each tile, or the width of the map? When calculating the tile indices, you should be dividing by the width of each tile, when converting tile x/y coordinates to your tile array index, you should multiply by the map width in tiles. – Apr 19 '13 at 08:28
`m_tileWidth` is the width of each tile whereas `m_layerWidth` is the width of the map _(ie, `m_nColumns * m_tileWidth`)_. Thanks, I got it to work using a slightly different approach: http://pastebin.com/EfCrDSLu – RectangleEquals Apr 19 '13 at 08:34
1

Yep, that works as well. My approach just optimises the tile index calculation out of the loop, but I doubt you're going to see a major performance loss because of that. :) – Apr 19 '13 at 08:44

MetalGodwin · Answer 2 · 2015-08-02T19:37:10.003

This is a pseudo codish example, geometry variables are in 2d vectors. Both the camera object and the tilemap has a center-position and a extent (half size). The math is just the same even if you decide to stick with pure numbers. Even if you don't use center coordinates and extent, perhaps you'll get an idea on the math. All of this code is in the render function, and is rather simplified. Also, this example assume you already got a 2D array -like object that holds the tiles.

So, first a full example, and I'll explain each part further down.

// x and y are counters, sx is a placeholder for x start value as x will 
// be in the inner loop and need to be reset each iteration.
// mx and my will be the values x and y will count towards too.
x=0,
y=0, 
sx=0, 
mx=total_number_of_tiles_on_x_axis, 
my=total_number_of_tiles_on_y_axis

// calculate the lowest and highest worldspace values of the cam
min = cam.center - cam.extent
max = cam.center + cam.extent

// subtract with tilemap corners and divide by tilesize to get 
// the anount of tiles that is outside of the cameras scoop
floor = Math.floor( min - ( tilemap.center - tilemap.extent ) / tilesize)
ceil = Math.ceil( max - ( tilemap.center + tilemap.extent ) / tilesize)

if(floor.x > 0)
    sx+=floor.x
if(floor.y > 0)
    y+=floor.y

if(ceil.x < 0)
    mx+=ceil.x
if(ceil.y < 0)
    my+=ceil.y

for(; y<my; y++)
    // x need to be reset each y iteration, start value are stored in sx
    for(x=sx; x<mx; x++)
       // render tile x in tilelayer y

Explained bit by bit. First thing in the render function, we will use a few variables.

// x and y are counters, sx is a placeholder for x start value as x will 
// be in the inner loop and need to be reset each iteration.
// mx and my will be the values x and y will count towards too.
x=0,
y=0, 
sx=0, 
mx=total_number_of_tiles_on_x_axis, 
my=total_number_of_tiles_on_y_axis

To prevent rendering all tiles, you need to provide either a camera-like object or information on where the visible area starts and stops (in worldspace if the scene is movable)

In this example I'm providing a camera object to the render function which has a center and an extent stored as 2d vectors.

// calculate the lowest and highest worldspace values of the cam
min = cam.center - cam.extent
max = cam.center + cam.extent

// subtract with tilemap corners and divide by tilesize to get 
// the anount of tiles that is outside of the cameras scoop
floor = Math.floor( min - ( tilemap.center - tilemap.extent ) / tilesize)
ceil = Math.ceil( max - ( tilemap.center + tilemap.extent ) / tilesize)
// floor & ceil is 2D vectors

Now, if floor is higher than 0 or ceil is lower than 0 on any axis, it means that there just as many tiles outside of the camera scoop.

// check if there is any tiles outside to the left or above of camera
if(floor.x > 0)
    sx+=floor.x// set start number of sx to amount of tiles outside of camera
if(floor.y > 0)
    y+=floor.y // set startnumber of y to amount of tiles outside of camera

 // test if there is any tiles outisde to the right or below the camera
 if(ceil.x < 0)
     mx+=ceil.x // then add the negative value to mx (max x)
 if(ceil.y < 0)
     my+=ceil.y // then add the negative value to my (max y)

A normal render of the tilemap would go from 0 to number of tiles that axis, this using a loop within a loop to account for both axis. But thanks to the above code x and y will always stick to the space within the border of the camera.

// will loop through only the visible tiles
for(; y<my; y++)
    // x need to be reset each y iteration, start value are stored in sx
    for(x=sx; x<mx; x++)
       // render tile x in tilelayer y

Hope this helps!

Optimizing a simple 2D Tile engine (+potential bugfix)

Preface

Issues

Bug

Code

Questions

2 Answers2