-2

I'm running the following code:

x = []
for i in c:
    x = x+i

The result has about 50-100 million elements.

This takes several minutes to run on my PC. How can I accelerate this?

Patrick Artner
  • 50,409
  • 9
  • 43
  • 69
cjm2671
  • 18,348
  • 31
  • 102
  • 161
  • sum(c) is definitely better code, I hadn't thought of that. Still seems to be slow though. – cjm2671 Nov 01 '18 at 17:46
  • But *what is `c`*? Is it a list, NumPy array, tuple or something else? – jpp Nov 01 '18 at 17:47
  • It's a list (though I could cast it to something else) – cjm2671 Nov 01 '18 at 17:47
  • A list of lists? A list of tuples? A list of floats? Objects? Can we see a sample? – PMende Nov 01 '18 at 17:49
  • 1
    No `sum` is **not better code**. Use `.extend` to concatenate in a loop, that will have amoratized constant time behavior instead of quadratic. Or use `+=` (which calls extend). Just plain list concatenation, `+` is O(A+B), so to concatenate a bunch of small lists, this creates quadratic time complexity on the size of the final list. Instead, `extend` is amoratized O(B) where B is the size of the smaller list, so it is O(N) on the size of the final list. – juanpa.arrivillaga Nov 01 '18 at 17:49
  • A list of lists [[a, b, c], [b, c, d]...] – cjm2671 Nov 01 '18 at 17:50
  • 2
    @cjm2671, Please provide a **[mcve]**. – jpp Nov 01 '18 at 17:52
  • It looks like you just want to flatten your list of lists, in that case: `from itertools import chain; x = list(chain.from_iterable(c))` – juanpa.arrivillaga Nov 01 '18 at 17:53
  • @juanpa.arrivillaga Running some minimal tests, `+=` runs approximately 20-30% faster than `extend`. So I'd be surprised if they are doing the same thing. – PMende Nov 01 '18 at 17:55
  • += runs in about 4 seconds. Compared to 10 minutes, I consider this to be a real victory. – cjm2671 Nov 01 '18 at 17:56
  • 2
    @PMende they are doing the same thing. `+=` is just faster because the interpreter doesn't have to resole `c.extend`, attribute access is slow, `+=` will shortcut it – juanpa.arrivillaga Nov 01 '18 at 17:57
  • 2
    @cjm2671 try `list(chain.from_iterable(c))`, should be the fastest – juanpa.arrivillaga Nov 01 '18 at 17:58
  • list(chain.from_iterable(c)) = 3.51 seconds - that's the winner! – cjm2671 Nov 01 '18 at 17:59
  • @juanpa.arrivillaga Makes sense. Thanks! Looks like `+=` and `chain.from_iterable` are the fastest of the options. – PMende Nov 01 '18 at 17:59
  • Would you mind using latest version of pypy? It is said to be generally alot faster than cpython implementation. Try it and see for yourself if it is so. – RyuCoder Nov 01 '18 at 18:32
  • https://stackoverflow.com/questions/12088089/python-list-concatenation-efficiency – Serge Nov 01 '18 at 18:33
  • @juanpa.arrivillaga why not post your answer as...well...an answer? The question asks how to accelerate the computation: so your answer is definitely valid...admittedly there are *many* answers, though. – Matt Messersmith Nov 01 '18 at 19:11

1 Answers1

0

Already compared in join list of lists in python

with Python 2 being faster with .extend than with itertools.chain

An exotique method

l = [] 
for x in c: l[0:0] = x 

among the faster according to stackoverflow.com/questions/12088089/…

For python 3.5 and latter, even more exotic

l = [] 
for x in c: 
    l = [l, *x]

Of course sum(c, []) is among worst on all the measurements.

Serge
  • 3,387
  • 3
  • 16
  • 34