Improve speed performance of a python code

Question

The function transform_input does the following:

The combination parameter is a list of tuples (the size of this list is usually maxed at 6) containing 1 or 2 values.
It then removes all redundancy inside combination. What I call redundancy is to remove duplicates and to remove tuples that contains the same values swapped. Examples of redundant tuples: (2,3) and (3,2), (7,) and (7,), (1,2) and (1,2).
The resulting tuple must be in ascending order. Example: (2,4) and not (4,2).
The order of the resulting list is not important.
All tuple values are positive integers.

The code I came up with is the one below (with a running example):

def transform_input(combination):
  combination = [list(elem) for elem in combination]
  for t in combination:
      t.sort()
  combination = [tuple(elem) for elem in combination]
  combination = list(set(combination))
  return combination

my_input = [
[(3,8), (8,3), (7,)],
[(8,8), (8,8)],
[(3,), (7,), (6,), (3,), (7,)],
[(2,3), (3,2)],
[(2,), (2,)]
]

for comb in my_input:
  print(transform_input(comb))

The output is:

[(3, 8), (7,)]
[(8, 8)]
[(6,), (7,), (3,)]
[(2, 3)]
[(2,)]

Is there a way that I can improve the speed performance of transform_input?

This piece of code is executed a lot in my real program, every improvement would be great.

PieCot · Accepted Answer · 2020-05-03T09:33:34.960

Here two other possible approaches; the first is a slight variation of the answer posted by Błotosmętek, using {} rather than set (it should be faster on lists with very few elements, take a look here and here).

def transform_input3(combination):
    return list({tuple(sorted(elem)) for elem in combination})

The other approach is based on the map function (i.e., using iterators).

def transform_input4(combination):
    return list(set(map(tuple, map(sorted, combination))))

If the inner tuples cannot contain more than 2 elements, you can implement something like this to avoid sorting each of them and conversion from list to tuple:

def transform_input6(combination):
    return [t for t in {x[::-1] if x[0] > x[-1] else x for x in combination}]

I added some items to your my_input (and the one used by alireza yazdandoost) and benchmarked all the approaches, running every function 100 times on each item; this is the new my_input:

my_input = [
    [(3, 8), (8, 3), (7,)],
    [(3, 8), (8, 3), (7,), (3, 8), (8, 3), (7,)],
    [(8, 8), (8, 8)],
    [(3,), (7,), (6,)],
    [(3,), (7,), (6,), (3,), (7,), (3,)],
    [(2, 3), (3, 2), (2, 3), (3, 2), (2, 3), (3, 2), (2, 3), (3, 2), (2, 3), (3, 2), (2, 3), (3, 2)],
    [(2,), (2,)],
    [(2,), (2,), (2,), (2,), (2,), (2,)],
    [(3, 8), (8, 3), (7,), (2, 3), (3, 2), (2, 3)],
    [(3, 8), (8, 3), (7,), (3, 4), (8, 8), (8, 8), (3,), (7,), (6,), (3,), (7,), (2, 3), (3, 2), (2,), (2,)],
]

These are the results of the benchmark (my_transform is your original code, my_transform2 is Błotosmętek's answer and my_transform5 is alireza yazdandoost's answer), as mean and standard deviation of the execution time (in sec):

[(3, 8), (8, 3), (7,)]
transform_input: 2.524e-06 (9.738e-07)
transform_input2: 2.067e-06 (2.728e-07)
transform_input3: 1.750e-06 (1.570e-07)
transform_input4: 1.804e-06 (1.984e-07)
transform_input5: 1.116e-06 (2.566e-07)
transform_input6: 1.051e-06 (1.874e-07)

[(3, 8), (8, 3), (7,), (3, 8), (8, 3), (7,)]
transform_input: 3.743e-06 (2.239e-07)
transform_input2: 3.308e-06 (2.461e-07)
transform_input3: 2.893e-06 (1.354e-07)
transform_input4: 2.848e-06 (9.448e-07)
transform_input5: 1.710e-06 (2.458e-07)
transform_input6: 1.477e-06 (1.823e-07)

[(8, 8), (8, 8)]
transform_input: 1.894e-06 (2.101e-07)
transform_input2: 1.641e-06 (1.979e-07)
transform_input3: 1.370e-06 (1.294e-07)
transform_input4: 1.593e-06 (1.063e-06)
transform_input5: 9.250e-07 (2.063e-07)
transform_input6: 8.319e-07 (1.358e-07)

[(3,), (7,), (6,)]
transform_input: 2.415e-06 (1.107e-06)
transform_input2: 1.944e-06 (2.173e-07)
transform_input3: 1.635e-06 (1.073e-07)
transform_input4: 1.710e-06 (1.585e-07)
transform_input5: 1.079e-06 (1.747e-07)
transform_input6: 1.061e-06 (9.636e-07)

[(3,), (7,), (6,), (3,), (7,), (3,)]
transform_input: 3.653e-06 (1.049e-06)
transform_input2: 3.152e-06 (2.191e-07)
transform_input3: 2.749e-06 (1.331e-07)
transform_input4: 2.576e-06 (1.495e-07)
transform_input5: 1.674e-06 (2.078e-07)
transform_input6: 1.285e-06 (1.356e-07)

[(2, 3), (3, 2), (2, 3), (3, 2), (2, 3), (3, 2), (2, 3), (3, 2), (2, 3), (3, 2), (2, 3), (3, 2)]
transform_input: 6.527e-06 (2.066e-07)
transform_input2: 5.980e-06 (2.107e-07)
transform_input3: 5.370e-06 (1.381e-07)
transform_input4: 5.504e-06 (4.534e-06)
transform_input5: 2.974e-06 (1.017e-06)
transform_input6: 2.422e-06 (1.929e-07)

[(2,), (2,)]
transform_input: 1.830e-06 (1.616e-07)
transform_input2: 1.565e-06 (2.068e-07)
transform_input3: 1.317e-06 (1.332e-07)
transform_input4: 1.541e-06 (1.074e-06)
transform_input5: 9.414e-07 (1.911e-07)
transform_input6: 8.253e-07 (1.391e-07)

[(2,), (2,), (2,), (2,), (2,), (2,)]
transform_input: 3.798e-06 (1.423e-06)
transform_input2: 3.165e-06 (2.080e-07)
transform_input3: 2.767e-06 (1.471e-07)
transform_input4: 2.602e-06 (1.492e-07)
transform_input5: 1.718e-06 (2.056e-07)
transform_input6: 1.255e-06 (1.435e-07)

[(3, 8), (8, 3), (7,), (2, 3), (3, 2), (2, 3)]
transform_input: 3.762e-06 (2.038e-07)
transform_input2: 3.323e-06 (2.212e-07)
transform_input3: 2.923e-06 (1.341e-07)
transform_input4: 2.763e-06 (1.599e-07)
transform_input5: 1.663e-06 (2.349e-07)
transform_input6: 1.468e-06 (1.796e-07)

[(3, 8), (8, 3), (7,), (3, 4), (8, 8), (8, 8), (3,), (7,), (6,), (3,), (7,), (2, 3), (3, 2), (2,), (2,)]
transform_input: 7.692e-06 (3.647e-07)
transform_input2: 7.048e-06 (2.708e-07)
transform_input3: 6.550e-06 (1.439e-06)
transform_input4: 6.143e-06 (2.281e-06)
transform_input5: 3.450e-06 (2.875e-07)
transform_input6: 2.544e-06 (2.064e-07)

Here you can find the code used to produce the results and you can play with it.

very good, I think you have just forgotten to cast the final response to the list. — alireza yazdandoost, May 03 '20 at 07:25

score 1 · Answer 2 · answered May 02 '20 at 13:18

1

This might be a little bit more efficient (not sure, check):

def transform_input(combination):
  return list(set([tuple(sorted(elem)) for elem in combination]))

answered May 02 '20 at 13:18

Błotosmętek

12,717
19
29

alireza yazdandoost · Answer 3 · 2020-05-02T14:45:17.123

As you know tuples are hashable, so you can use them as dict keys.

As you may know, inserting to and searching in hashtables is done in O(1) order.

So using a hashtable(dict in python) may help your code to be improved.

I wrote the transform_input2 based on the above facts.

I know It's not very well readable but it enhanced the performance.

*I changed your code a bit to be able to calculate the elapsed time in each solution better.

import time

def transform_input3(combination):
    print("start")
    start = time.time()
    res = list(set([tuple(sorted(elem)) for elem in combination]))
    end = time.time()
    print(end - start)
    return res


def transform_input2(combination):
    print("start")
    start = time.time()
    hashtable = {}
    for t in combination:
        rev = t[::-1]
        if rev in hashtable:
            if t[0]<rev[0]:
                hashtable[(t[0],t[1])] = None
            else:
                continue
        else:
            hashtable[t] = None
    res = list(hashtable.keys())    
    end = time.time()
    print(end - start)
    return res

def transform_input(combination):
    print("start")
    start = time.time()
    combination = [list(elem) for elem in combination]
    for t in combination:
        t.sort()
    combination = [tuple(elem) for elem in combination]
    combination = list(set(combination))
    end = time.time()
    print(end - start)
    return combination

my_input = [
(3,8), (8,3), (7,), (3,4),
(8,8), (8,8),
(3,), (7,), (6,), (3,), (7,),
(2,3), (3,2),
(2,), (2,)
]
print(transform_input(my_input))
print(transform_input2(my_input))
print(transform_input3(my_input))

here is the result for one execution:

start
3.600120544433594e-05
[(2,), (3,), (3, 8), (8, 8), (2, 3), (6,), (7,), (3, 4)]
start
1.0967254638671875e-05
[(3, 8), (7,), (3, 4), (8, 8), (3,), (6,), (2, 3), (2,)]
start
1.9788742065429688e-05
[(2,), (3,), (3, 8), (8, 8), (2, 3), (6,), (7,), (3, 4)]

Thank you for taking your time to help me out! – ihavenoidea May 03 '20 at 19:58 — ihavenoidea, May 03 '20 at 19:58
@ihavenoidea no problem :) – alireza yazdandoost May 03 '20 at 20:16 — alireza yazdandoost, May 03 '20 at 20:16

Improve speed performance of a python code

3 Answers3