0

I'm working on a leet problem and I'm just doing this for the fun of it. I think I can go as far as cracking all of the test inputs of the problem. So say, I know ahead of time all of the inputs, in order. What is the fastest structure to return the solutions.

I tried using a dict to map all inputs with the supposed solutions.

class DictSolution():
    DATA = {x:x for x in range(1000000)}

    def compute(self, x):
        return self.DATA[x]

Then, I thought, since I know in what order the inputs will be tested, I don't need to "look it up". So I tried using a set and ignoring all the inputs.

class SetSolution():
    DATA = {i for i in range(1000000)}

    def compute(self, x):
        return self.DATA.pop()

To my surprise, it was slightly slower than the dict, 1-2% slower everytime. Btw, here's how I time them.

def test_dict():
    sol = DictSolution()
    for i in range(1000000):
        sol.compute(i

ds = timeit.timeit(test_dict, number=1)
ss = timeit.timeit(test_set,  number=1)
print ("Dict Solution:", ds)
print ("Set  Solution:", ss)

>> Dict Solution: 0.11734077199999998
>> Set  Solution: 0.11939082499999998

Questions:

  1. Why is the set slower?
  2. Logically speaking, returning something in order should be faster than looking that thing up in a table. So I don't believe the dict way is the fastest already. What can I do to achieve better time?
thithien
  • 235
  • 1
  • 12
  • 1
    A set is unordered by definition. Why not use a list and just access by index without popping? – user2390182 Aug 26 '18 at 17:14
  • @schwobaseggl That is 10 times slower than both. I tried it already. I know `set` is unordered, but if nothing messes with it, the order should usually be retained. Plus I'm just going for time anyways. – thithien Aug 26 '18 at 17:14
  • @WaterWinter "but if nothing messes with it, the order should usually be retained", no, not really. The solution is simply incorrect. What exactly are you doing with the `dict`, `list` and `set` though? – juanpa.arrivillaga Aug 26 '18 at 17:35
  • @juanpa.arrivillaga Like said in the post. Say I know ahead of time what the inputs will be and in what order they will be inputted. I'm trying to figure out the fastest way to return the solution without actually processing/calculating the proper outputs. It's just a curious question, there's no purpose behind it beside figuring out if the `dict` is already the best thing in this case. "The solution is simply incorrect." I do acknowledge that a `set` is just not the proper thing to use in this case. However, it passes the test cases, and I'm speaking time-wise so it doesn't really matter. – thithien Aug 26 '18 at 17:43

2 Answers2

1

I believe the suggestion from @schwobaseggl is correct. From here, the complexity of accessing an element is O(1) for both dict and list, this was somewhat replicated in this question, in fact in the previous setting list was sligthly faster than dict. I replicated your benchmark, and added other data structures, in full:

  • List
  • Dictionary
  • Set (using pop)
  • Deque (from collections)
  • Tuple

The code:

import timeit
from collections import deque


class DictSolution:
    DATA = {x: x for x in range(1000000)}

    def compute(self, x):
        return self.DATA[x]


class SetSolution:
    DATA = {i for i in range(1000000)}

    def compute(self, x):
        return self.DATA.pop()


class DequeSolution:
    DATA = deque(i for i in range(1000000))

    def compute(self, x):
        return self.DATA.popleft()


class ListSolution:
    DATA = [i for i in range(1000000)]

    def compute(self, x):
        return self.DATA[x]


class TupleSolution:
    DATA = tuple(i for i in range(1000000))

    def compute(self, x):
        return self.DATA[x]


def test_dict():
    sol = DictSolution()
    for i in range(1000000):
        sol.compute(i)


def test_set():
    sol = SetSolution()
    for i in range(1000000):
        sol.compute(i)


def test_deque():
    sol = DequeSolution()
    for i in range(1000000):
        sol.compute(i)


def test_list():
    sol = ListSolution()
    for i in range(1000000):
        sol.compute(i)


def test_tuple():
    sol = TupleSolution()
    for i in range(1000000):
        sol.compute(i)


def test_pop_list():
    sol = PopListSolution()
    for i in range(1000000):
        sol.compute(i)


des = timeit.timeit(test_deque, number=1)
ss = timeit.timeit(test_set, number=1)
ds = timeit.timeit(test_dict, number=1)
ls = timeit.timeit(test_list, number=1)
ts = timeit.timeit(test_tuple, number=1)
times = [("Dict Solution:", ds), ("Set Solution:", ss), ("Deque Solution:", des), ("List Solution:", ls), ("Tuple Solution:", ts)]

for label, time in sorted(times, key=lambda e: e[1]):
    print(label, time)

Output

Tuple Solution: 0.1597294129896909
List Solution: 0.16653884798870422
Dict Solution: 0.17414769899914972
Set Solution: 0.190879073983524
Deque Solution: 0.1914772919844836

I ran the script several times and the results were similar with the tuple solution and list solution alternating the lead. Note that both SetSolution and DequeSolution were the slowest. So to answer your questions:

  1. Both set and deque are slower because you are deleting an element from the list, in the other structures you are only accessing the elements.
  2. This was partially answered in previous answer, pop does not only returns an element from the data structure it also deletes the element from it. So I will expect that updating a data structure will be slower than accessing only one of the elements of it.

Notes Although pop works with set for the test cases, in general this behaviour is not expected, for instance:

test = {'e' + str(i) for i in range(10)}
while test:
    print(test.pop())

Output (set pop)

e5
e8
e6
e0
e1
e3
e7
e4
e9
e2

More on this topic can be found here.

I also benchmarked a solution using list.pop(0) albeit using a smaller range: 100000, and fewer candidates (list, tuple and dict). The results were the following:

('List Solution:', 0.018702030181884766)
('Tuple Solution:', 0.021403074264526367)
('Dict Solution:', 0.02230381965637207)
('List Pop Solution', 1.8658080101013184)

The benchmark was ran on the following setup:

Intel(R) Core(TM) i7-4500U CPU @ 1.80GHz
16GB
Ubuntu 16.04
Python 3.5.2
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
  • Thanks for the extensive answer. However, your tuple and list solution do not fit my context. As said, I've cracked all of the test cases and known ahead of time what the inputs are and in what order they'll be tested. The inputs, however, can be anything. For instance, the purpose of `compute()` is to double a number, and I know that the test cases are 10, 2000, 3. For the queue solution, I can just have a `20, 4000, 6` queue and deque them in order and pass all of the test cases without doing the actual work. The inputs are not 1,2,3,4 so saying `self.DATA[x]` would not work. – thithien Sep 03 '18 at 21:48
  • I added an extra variable to track the index and ignore the input. Without a doubt, both the tuple and list go to the bottom of the results. And `dict` is still the fastest. – thithien Sep 03 '18 at 21:51
  • With my human logic, I just can't wrap my head around why looking for something in a shuffle bag is the fastest way of returning a sequence of items in order. It just does not make sense to me at all and I strongly believe there should be a better way to do it than to look for it in the dict. – thithien Sep 03 '18 at 21:54
0

The dict is the fastest lookup data structure because it is implemented using a hash table: it nearly takes constant time to look up a key in a hash table. Check out the link below for more info:

This pdf from MIT explains the subject

moctarjallo
  • 1,479
  • 1
  • 16
  • 33
  • 3
    A set also hashes its values so isn't a set lookup essentially identical to the dictionary one ? – Evya Aug 26 '18 at 19:00
  • @EvyatarMeged probably because the set handles more other stuff than the dict: like keeping the uniqueness of elements, so on.. – moctarjallo Aug 27 '18 at 11:22
  • 1
    but uniqueness is only tested when you are adding to the set - that shouldn’t affect lookup times whatsoever. – Evya Aug 27 '18 at 11:27