5

I have an already sorted set in memory of size (N) and want to dump it into redis, can it be done in O(N) if inserted head or tail first? or it doesn't matter, and the insertion will be O(log(N!)) ~ O(N log(N))

For further details, redis sorted sets are implemented using a hashmap and a skiplist (for the ordering).

EDIT: this question remains unanswered since quite a bit, or at least the answer is a bit ambiguous for me: Redis: Is ZADD better than O(logN) when the inserted element is at the beginning or end?

Community
  • 1
  • 1
Daren
  • 3,337
  • 4
  • 21
  • 35
  • IMO it doesn't matter – Itamar Haber Feb 13 '15 at 12:09
  • I'm not familiar with skiplists but for other ordered data structures (like trees), if you insert in certain order you do get a log(N)... and from the wikipedia article on skip-lists it does seem to be possible to get O(N) but is REDIS implementation the same? – Daren Feb 13 '15 at 12:12
  • Interesting question... I'll try benchmarking to see if I can identify any empirical differences. An alternative would be to read the source: https://github.com/antirez/redis :) – Itamar Haber Feb 13 '15 at 12:17
  • I tried reading the source but after 20 mins despaired... Sorry. Hadn't though about bench-marking though. – Daren Feb 13 '15 at 12:19
  • Now that there's a bounty I'm really motivated :P – Itamar Haber Feb 18 '15 at 03:44

2 Answers2

3

Here are the results from my "empirical" approach which suggest there may be a slight benefit to having order :)

(.venv)foo@bar:~/so_bounty$ python main.py
ascending order
5.57414388657
descending order
5.72963309288
random order
6.75937390327
0 score
5.79048109055
Daren
  • 3,337
  • 4
  • 21
  • 35
Itamar Haber
  • 47,336
  • 7
  • 91
  • 117
  • could you detail the method used in your simulations? iterations, data set sizes, etc. Thanks! – Daren Feb 18 '15 at 10:24
  • 1
    The link to the code is in the answer (https://gist.github.com/itamarhaber/6c9f3dd75ec5e25d8044) - I used set sizes of 10,000 and did a 100 iterations per ordering in a VM on a laptop - YMMV :) – Itamar Haber Feb 18 '15 at 11:06
  • Dam... tried replicating the experiment in java and didn't have conclusive results (with different set sizes, 1k, 10k and 100k). It can't be any stupid mistake like getting a random in python simply takes longer to calculate so random sets simply are slower to build? – Daren Feb 18 '15 at 12:15
  • Could be lots of stupid mistakes - I have and still am making these daily ;) - but I believe I avoided that by preparing the set before timing the inserts. – Itamar Haber Feb 18 '15 at 12:50
  • well, since nobody looked at the source code for an answer, you get the bounty. Enjoy. – Daren Feb 23 '15 at 11:24
  • woot! now that the bounty's secure, let's get real help :) – Itamar Haber Feb 23 '15 at 13:59
  • I double the bounty if you read the source code and make sure... :) but warn me, the answer must be after the new bounty is declared. – Daren Feb 24 '15 at 14:43
1

After having doubts on the methodology employed in another empirical approach, I have conducted my own insertion benchmark (all sets are initialized before timing insertion, for random insertion testing, the list of tuples is shuffled before we start timing) with the following results:

for an ordered set with 2k, 20k and 200k members:

  • head first: 196.29s | 1146.43s | 9897.29s
  • tail first: 170.14s | 993.43s | 9722.14s
  • rand insert: 146.00s | 1014.57s | 9968.57s

All with enough variability (standard devs at 7.8 | 54.5 | 324.5 respectively) so the difference are not significant enough for conclusions. Seems it doesn't matter... :(

Daren
  • 3,337
  • 4
  • 21
  • 35