2

As far as I understand it, a bitmap index search will return an array of 0s and 1s, like below:

[0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0]

Each index in the array is mapped to some record in the database in some other array, so to get results you need to find the indices of the nonzero elements in the results array.

What I don't understand is how do you find these indices in constant time? The best algorithm I can think of is to go through each element in the array, check if it's nonzero, and if it's nonzero then write the index of that element somewhere else. However, this would mean looking at every element in the array sequentially, which is linear time. So the time taken to return the results will be proportional to the size of the results array, which is the same as the total number of rows in the table.

However, the bitmap index papers I've read seem to suggest that the query time is only proportional to the number of hits and not to the total number of rows in the table. Reference:

http://crd-legacy.lbl.gov/~kewu/ps/LBNL-59952.pdf

Did I misunderstand something? Is the result of the bitmap index search not presented as an array but some other data structure that enables constant time search for nonzero elements?

user2108462
  • 855
  • 7
  • 23
  • 1
    Well, to be proportional to the number of hits, perhaps they're storing the non-zero indices instead of the actual bitmap (e.g. `[6, 7, 11]` in your case). Searching that would still be linear but "less linear" than searching the bitmap. Or perhaps the "constant time" comment is referring to testing whether a specific index is dirty, which is just a single array access in the bitmap, as opposed to finding all dirty indices... – twalberg Dec 16 '14 at 20:16

1 Answers1

1

The key is to properly store the index: say, using the Run-Length-Encoded compression you mentioned in another question. Each suitable lookup will take O(hits) time then.

Danylo Mysak
  • 1,514
  • 2
  • 16
  • 22
  • What kind of algorithm would be able to work on an array encoded in such a way? – user2108462 Dec 16 '14 at 20:44
  • 1
    For Run-Length-Encoded you'll basically have a sequence of [[0, 6], [1, 2], [0, 3], [1, 1], [0, 3]] instead of [0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0]. As you can, see it has a maximum of (number of 1s)×2+1 elements, so its length is O(1s), i.e. O(number of hits). You just go through the list, add the numbers up or do whatever you want with them. – Danylo Mysak Dec 16 '14 at 20:50
  • Ah, so to find the index of the first 1 in that list, I'd look at the second number in the first sub-list ([0, 6]) which is 6? And continue adding the numbers on to get the index of the next 1 and so on? – user2108462 Dec 16 '14 at 20:54
  • 1
    Exactly. But not to forget that there can be several ones in succession. – Danylo Mysak Dec 16 '14 at 20:56