5

I was working on an algorithm and in that, we are trying to write every line in the code such that it adds up a good performance to the final code.

In one situation we have to add lists (more than two specifically). I know some of the ways to join more than two lists also I have looked upon StackOverflow but none of the answers are giving account on the performance of the method.

Can anyone show, what are the ways we can join more than two lists and their respective performance?

Edit : The size of the list is varying from 2 to 13 (to be specific). Edit Duplicate : I have been specifically asking for the ways we can add and their respected questions and in duplicate question its limited to only 4 methods

3 Answers3

5

There are multiples ways using which you can join more than two list.

Assuming that we have three list,

a = ['1']
b = ['2']
c = ['3']

Then, for joining two or more lists in python,

1) You can simply concatenate them,

 output = a + b + c

2) You can do it using list comprehension as well,

res_list = [y for x in [a,b,c] for y in x] 

3) You can do it using extend() as well,

a.extend(b)
a.extend(c)
print(a)

4) You can do it by using * operator as well,

res = [*a,*b,*c]

For calculating performance, I have used timeit module present in python.

The performance of the following methods are;

4th method < 1st method < 3rd method < 2nd [method on the basis of time]

That means If you are going to use " * operator " for concatenation of more than two lists then you will get the best performance.

Hope you got what you were looking for.

Edit:: Image showing performance of all the methods (Calculated using timeit)

enter image description here

0xPrateek
  • 1,158
  • 10
  • 28
  • 2
    Why would you not include the timings and the code used to set them up? Also, why such small lists? – roganjosh Jun 15 '19 at 16:26
  • 1
    Ohh.. Going to add it . 1 min – 0xPrateek Jun 15 '19 at 16:27
  • I suspect the size of the lists to have an impact in terms of memory allocation – roganjosh Jun 15 '19 at 16:28
  • Since it was not given in the question about size so, i took it minimum – 0xPrateek Jun 15 '19 at 16:31
  • 1
    But you can't possibly answer the question if you're using the minimum size because it says nothing about complexity. Also, you should be posting the code and outputs as formatted text, not screenshots – roganjosh Jun 15 '19 at 16:32
  • 2
    Thanks for telling me that. I was not aware of that style. @roganjosh I am going to add another output which is of the performance of the methods when list size is large. – 0xPrateek Jun 15 '19 at 16:35
  • Hope it will add some quality to answer – 0xPrateek Jun 15 '19 at 16:35
  • Hey, Prateek your answer seems to be very expressive.@roganjosh sorry I forgot to add the size of the list. I think that the size of the list will matter only for the performance factor. – Paul kingser Jun 15 '19 at 16:39
  • 1
    @Paulkingser I thought the performance was the whole point of your question? It's mentioned 4 times in your question. – roganjosh Jun 15 '19 at 16:42
  • @0xPrateek and roganjosh will the performance from the performance you have shown since the size is not that big. It will always in that range only. 2 -13 – Paul kingser Jun 15 '19 at 16:43
  • 1
    Why did you not consider ``itertools.chain`` method? I did the same tests as you with larger lists and ``itertools.chain`` was hundreds of times faster. – AlCorreia Jun 15 '19 at 17:05
3

I did some simple measurements, here are my results:

import timeit

from itertools import chain

a = [*range(1, 10)]
b = [*range(1, 10)]
c = [*range(1, 10)]

tests = ("""output = list(chain(a, b, c))""",
"""output = a + b + c""",
"""output = [*chain(a, b, c)]""",
"""output = a.copy();output.extend(b);output.extend(c);""",
"""output = [*a, *b, *c]""",
"""output = a.copy();output+=b;output+=c;""",
"""output = a.copy();output+=[*b, *c]""",
"""output = a.copy();output += b + c""")

results = sorted((timeit.timeit(stmt=test, number=1, globals=globals()), test) for test in tests)

for i, (t, stmt) in enumerate(results, 1):
    print(f'{i}.\t{t}\t{stmt}')

Prints on my machine (AMD 2400G, Python 3.6.7):

1.  6.010000106471125e-07   output = [*a, *b, *c]
2.  7.109999842214165e-07   output = a.copy();output += b + c
3.  7.720000212430023e-07   output = a.copy();output+=b;output+=c;
4.  7.820001428626711e-07   output = a + b + c
5.  1.0520000159885967e-06  output = a.copy();output+=[*b, *c]
6.  1.4030001693754457e-06  output = a.copy();output.extend(b);output.extend(c);
7.  1.4820000160398195e-06  output = [*chain(a, b, c)]
8.  2.525000127207022e-06   output = list(chain(a, b, c))
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
2

If you are going to concatenate a variable number of lists together, your input is going to be a list of lists (or some equivalent collection). The performance tests need to take this into account because you are not going to be able to do things like list1+list2+list3.

Here are some test results (1000 repetitions):

option1 += loop          0.00097 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]
option2 itertools.chain  0.00138 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]
option3 functools.reduce 0.00174 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]
option4 comprehension    0.00188 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]
option5 extend loop      0.00127 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]
option6 deque            0.00180 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]

This would indicate that a += loop through the list of lists is the fastest approach

And the source to produce them:

allLists = [ list(range(10)) for _ in range(5) ]

def option1():
    result = allLists[0].copy()
    for lst in allLists[1:]:
        result += lst
    return result

from itertools import chain
def option2(): return list(chain(*allLists))

from functools import reduce
def option3():
    return list(reduce(lambda a,b:a+b,allLists))

def option4(): return [ e for l in allLists for e in l ]

def option5():
    result = allLists[0].copy()
    for lst in allLists[1:]:
        result.extend(lst)
    return result

from collections import deque
def option6():
    result = deque()
    for lst in allLists:
        result.extend(lst)
    return list(result)


from timeit import timeit
count = 1000

t = timeit(lambda:option1(), number = count)
print(f"option1 += loop          {t:.5f}",option1()[:15])

t = timeit(lambda:option2(), number = count)
print(f"option2 itertools.chain  {t:.5f}",option2()[:15])

t = timeit(lambda:option3(), number = count)
print(f"option3 functools.reduce {t:.5f}",option3()[:15])

t = timeit(lambda:option4(), number = count)
print(f"option4 comprehension    {t:.5f}",option4()[:15])

t = timeit(lambda:option5(), number = count)
print(f"option5 extend loop      {t:.5f}",option5()[:15])

t = timeit(lambda:option6(), number = count)
print(f"option6 deque            {t:.5f}",option6()[:15])
Alain T.
  • 40,517
  • 4
  • 31
  • 51
  • @PM 2Ring, Nice catch! thanks, this was completely skewing the results. Fixed it now. – Alain T. Jun 15 '19 at 18:13
  • No worries. I've done a fair few timeit tests, [eg](https://stackoverflow.com/a/47845960/4014959), so I'm wary of stuff like that. :) – PM 2Ring Jun 15 '19 at 18:24