For data compression, I want to replace values in a (long) list with an index about when the value appeared last. So the list:
18499 10123 5678 10123 10123 3344 10123 5678 4912 18499
would be replaced as follows:
N18449 N10123 N5678 K1 K0 N3344 K1 K2 N4912 K4
New values, that have not been seen before, are prefixed with N. Old values, that are already known, are prefixed with K. The second occurence of 10123 is for example replaced with K1, as there is one other value (5678) between. However, to keep indexes as low as possible, I want the K-indexes not to measure the distance in the list, but the actual number of unique other values seen between the last and the current occurence of a value. So, for example, the second occurence of 5678 is replaced with K2, as there are two other values (10123 and 3344) between them, even though the 10123 is repeated a few times. Similiarly, the last value 18499 is replaced with K4, as there are four other values betweeen it and the beginning of the list (which is also 18499). If just the distance was measured, the last element would be K9.
At first, it looks, that the compression/index replacement can be done using an LRU cache, for which stackoverflow holds some very good references (namely on LRU cache design). Unfortunately, though, classic LRU caches are not very good for this purpose: While the lookup, if an item is among the last N (the LRU cache size) items is fast with O(1), the lookup of the actual position of an item in the LRU cache is O(n) (with n the number of elements before the found one).
The same slowness holds for the decompression step, when Kn needs to be replaced with the corresponding value again: Walking through the linked list of a classic LRU cache to find the item to replace with, requires n steps.
I am quite aware, that an O(1) solution doesn't exist for my problem, as the counters about how many other items are in front need to be updated each time that a new element is added to the cache or an existing one is moved to the front. But is there an O(log(n)) solution? If at all possible, I don't want to go with O(N).
I want to implement this in C++, but pointers to such a data structure in any other programming language would also be appreciated. I am asking more about an algorithm than a specific implementation here.