3

I'm confused to decide which or what kind of algorithm to find an object based on the following criteria: There are 2 classes: 'TileSets' and 'Tile'. TileSet have 2 int attributes: firstTileId and lastTileId, while Tile has a single int attribute: id, like that:

struct TileSet { int firstTileId, lastTileId; } 

struct Tile { int id; }

The application is supposed to have no more than 10 TileSets (normally 3-5) and 10.000+ Tiles. Speed is extremely crucial to determine which TileSet a Tile with a given id belongs to. The first and last id attributes don't change after a tileset is added to the vector and they don't overlap each other, for example: {{1, 25}, {26, 125}, {126, 781}, {782, 789}...}. There are no holes either in the tile range, as we can see. Tiles vector is not ordered nor it can be. My current implementation (kind of pseudo short code) is:

Vector t = 10.000+ tiles
Vector ts = tilesets with a size of a number of a power of 2 number bigger than 6, at least
for tileIndex = 0; tileIndex < t.size; tileIndex++, do:
   for tilesetIndex = 0; tilesetIndex < ts.size; tilesetIndex++, do:
      if (ts[tilesetIndex].firstTileId >= t[tileIndex].id && t[tileIndex].id <= ts[tilesetIndex].lastTileId) 
         // tile t[tileIndex] belongs to the tileset ts[tilesetIndex]! Done!

What kind of algorithm could I use for this kind of situation? Are there any formulas to this?

fireant
  • 14,080
  • 4
  • 39
  • 48
Yves Calaci
  • 1,019
  • 1
  • 11
  • 37
  • 1
    Can the tilesets interlap? Do they change often? Do you always have to determine a set _for each_ tile? – Petr Oct 19 '15 at 13:53
  • I don't understand what you mean by 'interlap' and changes. I always have to determine which tileset a tile with a given I'd belongs to, in each of the 10.000+ iteration. Tilesets are ordered inserted, in other words, the vector will have entries with ranges (first and last id) like that: {{1, 25}, {26, 125}, {126, 781}, {782, 789}...}. Tiles vector is not ordered. – Yves Calaci Oct 19 '15 at 13:57
  • Can there be two tilesets one starting from 0 to 100 and the other starting from 50 to 150? You see, they interlap, as tiles from 50 to 100 belong to both of them. For "changes", I mean, are the tilesets fixed for all the time your application runs, or can tileset change their first and last id while the application runs? And how often can they change? – Petr Oct 19 '15 at 13:59
  • 1
    @Petr I believe the correct term is overlap not interlap. Maybe you meant interleave? – NathanOliver Oct 19 '15 at 14:01
  • @NathanOliver, yes, I meant overlap. It seems that it blended with intersect in my mind to produce interlap :) Though [some](http://dictionary.reference.com/browse/interlap) online dictionaries say that this word does exist too as a synonym to overlap. – Petr Oct 19 '15 at 14:02
  • @Petr, I updated my answer, so no: they don't interlap, overlap or whatsoever. They don't change either. – Yves Calaci Oct 19 '15 at 14:05
  • You could do a binary search on the tile sets. But it's probably faster to do a pre-processing step where you associate each tile with its tile set. – Nico Schertler Oct 19 '15 at 14:09
  • @Nico, and how's that association supposed to be? That's my main problem. Also, I don't see how a binary search can speed things up. Can you enlighten me? Things start getting slower when id's don't belong to the first tileset, which makes it loop through the tilesets vector too to find if the I'd matches it's range – Yves Calaci Oct 19 '15 at 14:15
  • Petr just posted the explanation for this. – Nico Schertler Oct 19 '15 at 14:16
  • You say "10.000+" tiles. Do you have a hard (or even somewhat firm) limit on the upper bound? Can we be reasonably certain the range of IDs is less than, say, a few million? – Jerry Coffin Oct 19 '15 at 23:30
  • @Jerry I'm not sure if I understand you, but by the time were trying to determine which tileset a tile belongs to, we will have: the total number of tiles (basically, tileVector.size()) and the number of tilesets (with the same previous approach). The number of tiles can also be calculed using a formula. The ranges can vary: the first one can be {1, 54} or even a super large one {1, 2407}. – Yves Calaci Oct 20 '15 at 02:38
  • @YvesHenri: No, I mean what is the range (the difference in values) between the largest TileID and the smallest TileID (overall, not just in a single range). – Jerry Coffin Oct 20 '15 at 04:07
  • @Jerry It's unknown, though we can set a huge one, like we can limit it to be 10.000, in other words, by doing so, we would have a maximum of 10.000 types of tiles. Is this what you're asking? Thanks for the heads up! – Yves Calaci Oct 20 '15 at 16:26
  • just added the missing `lastTileId` to your pseudo code – fireant Nov 03 '15 at 17:33

5 Answers5

7

You'd use an interval container that uses optimized storage and algorithms.

In this example using Boost ICL, I have made some "arbitrary" choices to generate nice disjunct TileSets:

using TileSets = icl::split_interval_set<int>;

struct TileSet : TileSets::interval_type::type {
    TileSet(int b, int e) : TileSets::interval_type(closed(b, e)) {}
};

struct Tile : TileSets::interval_type::type {
    Tile(int id) : TileSets::interval_type(closed(id, id)) {}
};

The beauty is the high-level coding:

Live On Coliru

TileSets gen_tiles   (size_t n = 100000);
TileSets gen_tilesets(size_t n = (2ull << 8) + 1);

#include <iostream>

int main() {
    auto const tiles = gen_tiles   (10);
    auto const ts    = gen_tilesets(30);

    std::cout << ts << "\n----\n";

    for (auto hit : tiles & ts) {
        std::cout << hit.lower() << " hits in tileset " << *ts.find(hit) << "\n";
    }
}

Prints

{[8,71)[71,133)[133,206)[206,231)[231,465)[465,467)[467,565](565,581)[581,651](651,907)[907,1000)[1000,1395](1395,1429)[1429,1560](1560,1706)[1706,1819)[1819,1835)[1835,1997)[1997,2124](2124,2328)[2328,2913)[2913,2922)[2922,3043)[3043,3338)[3338,3664](3664,3825](3825,3999)[3999,4320](4320,4506](4506,4561](4561,4593](4593,4668)[4668,5143)[5143,5248](5248,5633)[5633,5925](5925,6012](6012,6076)[6076,6117](6117,6119](6119,6175](6175,6184)[6184,6509)[6509,6804](6804,7081](7081,7220)[7220,7852](7852,8325)[8325,8600](8600,8662](8662,9386](9386,9423)[9423,9489](9489,9657](9657,9700](9700,9738](9738,9833](9833,9923]}
----
1561 hits in tileset (1560,1706)
1835 hits in tileset [1835,1997)
3746 hits in tileset (3664,3825]
4459 hits in tileset (4320,4506]
5969 hits in tileset (5925,6012]
5987 hits in tileset (5925,6012]
7320 hits in tileset [7220,7852]
7797 hits in tileset [7220,7852]
7966 hits in tileset (7852,8325)
9508 hits in tileset (9489,9657]

PERFORMANCE

When run with the default sizes (100000 tiles in 2^8+1 tilesets) it takes 0.034s on my box

$ time ./test | tee >(echo "total lines: $(wc -l)") | tail
9987 hits in tileset (9984,9990]
9988 hits in tileset (9984,9990]
9989 hits in tileset (9984,9990]
9990 hits in tileset (9984,9990]
9991 hits in tileset (9990,9995]
9992 hits in tileset (9990,9995]
9993 hits in tileset (9990,9995]
9994 hits in tileset (9990,9995]
9995 hits in tileset (9990,9995]
total lines: 9988

real    0m0.034s
user    0m0.029s
sys 0m0.008s

Live On Coliru that runs in 0.064s. That includes the time taken for output, which does redundant lookups (ts.find(hit))!

UPDATE - Higher Volumes

More performance testing with higher volumes and more specific timing output:

Live On Coliru

#include <boost/icl/interval_set.hpp>
#include <boost/icl/split_interval_set.hpp>

namespace icl = boost::icl;

using TileSets = icl::split_interval_set<int>;

struct TileSet : TileSets::interval_type::type {
    TileSet(int b, int e) : TileSets::interval_type(closed(b, e)) {}
};

struct Tile : TileSets::interval_type::type {
    Tile(int id) : TileSets::interval_type(id) {}
};

TileSets gen_tiles   (size_t n = (1ull << 22));
TileSets gen_tilesets(size_t n = (1ull << 12));

#include <iostream>
#include <iomanip>
#include <boost/chrono/chrono_io.hpp>

template <typename F>
auto timed(char const* task, F&& f) {
    using namespace boost::chrono;
    struct _ {
        high_resolution_clock::time_point s;
        const char* task;
        ~_() { std::cout << " -- (" << task << " completed in " << duration_cast<milliseconds>(high_resolution_clock::now() - s) << ")\n"; }
    } timing { high_resolution_clock::now(), task };

    return f();
}

int main() {
    auto const tiles = timed("Generate tiles", [] { return gen_tiles(); });
    auto const ts    = timed("Generate tile sets", [] { return gen_tilesets(); });

    //std::cout << ts << "\n----\n";

    std::cout << "Random tiles generated:    " << tiles.iterative_size() << " across a domain of " << std::setprecision(2) << static_cast<double>(tiles.size()) << "\n";
    std::cout << "Tilesets to match against: " << ts.iterative_size()    << " across a domain of " << std::setprecision(2) << static_cast<double>(tiles.size()) << "\n";

    timed("Query intersection", [&] { std::cout << "Total number of hits: "   << (tiles & ts).iterative_size() << "\n"; });
    timed("Query difference",   [&] { std::cout << "Total number of misses: " << (tiles - ts).iterative_size() << "\n"; });

    //for (auto hit : tiles & ts) {
        //std::cout << hit.lower() << " hits in tileset " << *ts.find(hit) << "\n";
    //}
}

#include <random>

static auto gen_tile_id = [prng=std::mt19937{42}, dist=std::uniform_int_distribution<>()] () mutable 
    { return dist(prng); };

TileSets gen_tiles(size_t n) {
    TileSets r;
    std::generate_n(icl::inserter(r, r.end()), n, [] () -> Tile { return gen_tile_id(); });
    return r;
}

TileSets gen_tilesets(size_t n) {
    TileSets r;
    std::generate_n(icl::inserter(r, r.end()), n, [] () -> TileSet {
                auto b = gen_tile_id(), e = gen_tile_id();
                return { std::min(b,e), std::max(b,e) };
            });
    return r;
}

Prints (on my box):

 -- (Generate tiles completed in 3773 milliseconds)
 -- (Generate tile sets completed in 152 milliseconds)
Random tiles generated:    4190133 across a domain of 4.2e+06
Tilesets to match against: 8191 across a domain of 4.2e+06
Total number of hits: 4187624
 -- (Query intersection completed in 1068 milliseconds)
Total number of misses: 2509
 -- (Query difference completed in 533 milliseconds)
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Here's the timings for a "huge" test case: Find which tiles from 2²² tiles do not occur in a given 2¹³ tileset: takes 540ms. **[Live On Coliru takes longer: 1581 milliseconds](http://coliru.stacked-crooked.com/a/28bdf1600408af0c)** – sehe Oct 19 '15 at 15:28
  • Thanks for the collaboration. I appreciate the help. Although I can not use boost and the 'nested for' solution even SEEMS to run faster than this one (comparing times from my machine to yours - 30 ms on an i7 3.4 and 4mb ram) – Yves Calaci Oct 19 '15 at 16:07
  • @YvesHenri would you care to share the code that you used to compare? – sehe Oct 19 '15 at 17:38
  • TimeProfiler: http://pastebin.com/PUPuzXCQ, TileSetManager: http://pastebin.com/gtnLfy0S, usage: http://pastebin.com/TJQvSyEp. I'm still not satisfied. I'm sure there might be a way to avoid storing X pointers to a TileSet, where X is the total number of Tiles, using some kind of a hashing function, for example: http://pastebin.com/thg8sGzt. The last link exemplifies how to transform a number into the next power of 2 number. If I could do some round up value with a given tileId to find the index of its corresponding TileSet, that would be very awesome (what I'm trying to do)! – Yves Calaci Oct 19 '15 at 19:42
  • Wait a bloddy moment. Are ranges consecutive?!? – sehe Oct 19 '15 at 20:12
  • yes, the range values are consecutive. Everything is conspiring in favor of it (values don't change nor overlap, no holes, consecutive values...), though I can't think of a better solution. What do you mean by 'comment the call to getTileSetByTileId out entirely, the generated assembly listing is exactly identical'? If I could only apply some formula to ceil or floor a tileId to its corresponding tileset range or at least diminishes the number of entries in the lookup vector (less iterations) using some kind of key pattern, that would help. Can you think of any? Thanks a lot – Yves Calaci Oct 19 '15 at 20:55
  • Yeah. Basically, see here: http://stackoverflow.com/a/33038436/85371. But I'm still live-refactoring towards a sane & fair comparison of the benchmarks! – sehe Oct 19 '15 at 20:56
  • Okay, I made a fair comparison now. The program tallies the number of samples (tile IDs) which go in which bracket. It queries 4M tiles against 1024 brackets. The query times are roughly the same ([old in ~479ms](http://paste.ubuntu.com/12868892/) vs. [new ICL in ~388ms](http://paste.ubuntu.com/12868897/)). The clincher is, the old version uses [**12.52GiB of RAM**](http://paste.ubuntu.com/12868966/) vs. [**only 16MiB**](http://paste.ubuntu.com/12868994/). Also, the time taken to setup the `TileSetManager` is 4965ms, whereas the time taken to setup the `TileSets` in my variant is <1ms. – sehe Oct 19 '15 at 23:06
  • So, in conclusion, the old version takes ~800x as much RAM and ~infinitely more time for initializing. The query times seem pretty similar, though. See my test bed code so you can replicate the whole ordeal: https://gist.github.com/sehe/f36ae99f42394a81e944 (it includes the intermediate revisions). You can see the recorded livestream [Part #1](http://tinyurl.com/pgaohhs) and [Part #2](http://tinyurl.com/pe6x95e) ([experiment](http://chat.stackoverflow.com/transcript/10?m=24182469#24182469)) – sehe Oct 19 '15 at 23:18
2

As your tile sets do not change, then the best strategy is to do some precalculation that will allow for a faster lookup. I can see several good approaches to this.

Lookup table

If tile ids are integers and not big enough, you can just create a lookup table. For each id you just record the number of tileset this id belongs to. Something like this

for set in tilesets
    for id=set.first to set.last
        setLookup[id] = set.number

Now to find a set by a tile id, you just look up

setLookup[tile.id]

Binary search

A second approach works if your tile ids are not integers or can be so big that the lookup table becomes impractical. Then you sort all your tilesets in advance so that their firsts increase (or lasts increase, which is equivalent as the sets do not overlap), and then use binary search to find the tileset given a tile id. However, if you have really a few of tilesets, this might be not faster than a sequential lookup, you'll have to test it.

Static association

Finally, if your tile ids do not change either, then I do not see why you can not associate tiles with tile sets completely in advance. Just have an additional field in your Tile class that stores the TileSet number (or reference or whatever).


Note that by saying "do not change" I mean "change not too often". If changes are allowed, but are quite rare, than you can implement any solution which assumes that it does not change, and do a complete recalculation each time something changes.

Petr
  • 9,812
  • 1
  • 28
  • 52
  • I kind of like the idea of building a lookup table, however it seems to take as much time as using the basic 'nested for' solution in my first post. For this, I created a TileSetManager that operates 2 vectors: one made of TileSet and the other pointers of TileSets. When adding a tileset to the manager, it stores in its tileset vector and also expand the lookup vector by adding a pointer to the tileset previously added, so when querying a tileset from a tile I'd, just do: return *lookup[tileId]; – Yves Calaci Oct 19 '15 at 16:13
  • Idk how to use code tags on comments and my phones 'enter key' doesn't create a new line on here, so.... The lookup vector creation is like this: for (int I = set.first; I <= set.last; i++) lookup.push_back(&sets.back()); – Yves Calaci Oct 19 '15 at 16:20
  • How would that association be?! It's important to note that the tilesets vector gets filled first and the tiles, or vice versa. I can not assign the tile's tileset when filling the tiles vector. Basically the tiles vector gets filled 2 times: one to fetch the tile I'd and the other to fetch the tilesets 'id' (again, when fetching the id's, the app doesn't know anything about the tilesets yet) – Yves Calaci Oct 19 '15 at 21:28
  • Sorry for the multiple posts. I'm new here and forgot about the 'at + name' thingy, thus I didn't know if you were being notified anyways or not – Yves Calaci Oct 20 '15 at 16:38
2

To this problem I would use optimized binary tree search, taking into account sizes of intervals. If tile ids has uniform distribution it may have sense to minimize the count of comparisons required to determine the TileSet for TileSet with bigger interval. This idea reminds Huffman coding algorithm where binary tree gets built the way that encoding of more frequent symbols the path in tree is minimized

Consider the following example.

Given TileSets:

[0,2), [2,9), [9,34), [34,39), [39,48), [48,148), [148,153), [153,154)

then sizes of intervals are:

2,7,25,5,9,100,5,1

Total interval length (sum of intervals) is:

length = 154 

Let's estimate number of comparisons for following approaches

  1. One-by-one comparison (like implemented in your question) If Tile belongs to first TileSet, then to find first TileSet required one comparison; If Tile belongs to second TileSet, two comparisons required, If Tile belongs to third TileSet, three comparisons required, and so on:

    C1 = (2*1 + 7*2 + 25*3 + 5*4 + 9*5 + 100*6 + 5*7 + 1*8)/length = 799/154 = 4.84
    
  2. Binary tree.

            / \
          /     \
        /         \
       / \       /  \
      /   \     /    \
     /\   /\   /\    /\
    2  7 25 5 9 100 5  1
    

    Every path takes 3 comparisons, so:

    C2 = 3
    
  3. Weighted tree.

             /  \
           /      \
         / \        \
       /\   \       / \
     /\  \   /\    /  /\
    2  7 25 5  9 100 5  1
    

    Comparisons estimation:

    C3 = (2*4+7*4+25*3+5*3+9*3+100*2+5*3+1*3)/154 = 2.41
    

As seen third approach requires less comparisons than others.

Tree gets built the following way: split TileSets into two parts such way that difference between sum of weights of the left part and the right part is minimized. For given example:

[2,7,25,5,9,100,5,1] => [2,7,25,5,9],[100,5,1]

Perform split on left and right parts until tree is built.

This approach is profitable when some TileSets are much wider than others.

Nyavro
  • 8,806
  • 2
  • 26
  • 33
1

10x faster? Here is how you can make your code run about 10 times (or more) faster. We want to remove branches, and vectorize our inner loop with the help of gcc.

We want to remove the condition inside the loop:

for (int i=0; i<10000; ++i) {
  for (int j=0; j<8; j++) {
    if ((tiles[i] >= lowerBounds[j]) &&
        (tiles[i] <= upperBounds[j])) {
      ids[i] = j;
    }
  }
}

This is a quick solution which you can probably improve:

for (int i=0; i<10000; ++i) {
  for (int j=0; j<8; ++j) {
    short int ld = range[j] - tiles[i] + lowerBounds2[j];
    ld = ld<0?0:ld;
    ld = ld>(range[j]-1)?0:ld;
    ld = ld>1?1:ld;
    ids2[i] += j*ld;
  }
}

The second solution is about 10 times faster on an i5-4200U if you ask g++ to optimize the code as we don't have the time for the AVX intrinsics and etc.:

g++ -std=c++11 -O3 -march=native

The timings for 10,000 tiles, and 8 tile-ranges, while the cpu speed is fixed at its base frequency:

Trivial: 0.147607 ms
Optimized: 0.014068 ms

The timing while the cpu is allowed to throttle to its highest frequency:

Trivial: 0.043876 ms
Optimized: 0.004328 ms

Here is the (quick & dirty) code, and I think you get the idea and can improve it:

#include <iostream>
#include <random>
#include <chrono>
#include <cstring>

using namespace std;
using namespace std::chrono;

int main() {
  short int lowerBounds [8] = {0, 2,  9, 34, 39,  48, 148, 153};
  short int upperBounds [8] = {1, 8, 33, 38, 47, 147, 152, 154};
  short int range       [8] = {3, 8, 26,  6, 10, 101,   6,   3};
  short int lowerBounds2[8] = {-1, 1, 8, 33, 38,  47, 147, 152};
  short int tiles [10000];
  short int ids [10000] = {0};
  short int ids2 [10000] = {0};

  // 10,000 random tiles
  default_random_engine gen;
  uniform_int_distribution<short int> dist(0, 154);
  for (int i=0; i<10000; ++i) {
    tiles[i] = dist(gen);
  }

  // *** trivial solution
  double bestTime = 1.0;
  for (int r=0; r<100; r++) {
    auto t1 = high_resolution_clock::now();
    for (int i=0; i<10000; ++i) {
      for (int j=0; j<8; j++) {
        if ((tiles[i] >= lowerBounds[j]) &&
            (tiles[i] <= upperBounds[j])) {
          ids[i] = j;
        }
      }
    }
    auto t2 = high_resolution_clock::now();
    auto elapsed = duration_cast<duration<double>>(t2 - t1).count();
    if (elapsed < bestTime)
      bestTime = elapsed;
  }
  cout<<"Trivial: "<<bestTime*1000<<" ms"<<endl;

  // *** optimized solution
  bestTime = 1.0;
  for (int r=0; r<100; r++) {
    // ids should be zero for this method
    memset(ids2, 0, 10000*sizeof(short int));
    auto t1 = high_resolution_clock::now();
    for (int i=0; i<10000; ++i) {
      for (int j=0; j<8; ++j) {
        short int ld = range[j] - tiles[i] + lowerBounds2[j];
        ld = ld<0?0:ld;
        ld = ld>(range[j]-1)?0:ld;
        ld = ld>1?1:ld;
        ids2[i] += j*ld;
      }
    }
    auto t2 = high_resolution_clock::now();
    auto elapsed = duration_cast<duration<double>>(t2 - t1).count();
    if (elapsed < bestTime)
      bestTime = elapsed;
  }
  cout<<"Optimized: "<<bestTime*1000<<" ms"<<endl;

  // validate
  for (int i=0; i<10000; i++)
    if ((ids[i] - ids2[i]) != 0) {
      cout<<"The results didn't match!"<<endl;
      break;
    }
}

You could also use multi-threading to gain a bit more speedup. That I supposed is easy for you to implement.

NB: If you don't set those optimization flags, the method I suggest will be just slightly faster or possibly even slower than the trivial method.

fireant
  • 14,080
  • 4
  • 39
  • 48
0

First of all, I'd sort the tile sets. E.g. first by firstTileId and then by lastTileId. Then you can use a binary search (untested code, please beware):

auto findTileSetIndex(const Vector& sets,
                      size_t start, size_t end,
                      const Tile& value)
-> signed int {
    if(start == end) return -1;
    size_t mid = start + (end-start)/2;
    if(sets[mid].firstTileId <= t[tileIndex].id &&
       sets[mid].lastTileId > t[tileIndex].id)
        return mid;
    if(sets[mid].firstTileId > t[tileIndex].id)
        return findTileSetIndex(sets, start, mid, value);
    return findTileSetIndex(sets, mid, end, value);
}

for(auto& tile : t) {
    auto tileSetIndex = findTileSetIndex(ts, 0, ts.size(), t);
    if(tileSetIndex > 0) {
       // t belongst to ts[tileSetIndex]
    }
}
cdonat
  • 2,748
  • 16
  • 24
  • thank you. However, I'm trying to achieve this with no looping at all. Your solution gives me an O(log n), while mine gives an O(n) which is worse, I know, but still thats not the case. I posted a solution which is by far the best one. It uses a lookup 'table', suggested by Petr. Somehow the tileset searching takes longer than creating the lookup vector... I have no idea why – Yves Calaci Oct 19 '15 at 19:46