Alternatives/Faster ways to list.extend in python?

Question

I have quite a large number of data sets to extend.

I'm wondering what would be an alternative/faster way of doing it.

I have tried both iadd and extend, both of them takes quite a while to create an output.

from timeit import  timeit

raw_data = [];
raw_data2 = [];
added_data = range(100000)

# .__iadd__
def test1():
    for i in range(10):
        raw_data.__iadd__(added_data*i);

#extend

def test2():
    for i in range(10):
        raw_data2.extend(added_data*i);


print(timeit(test1,number=2));
print(timeit(test2,number=2));

I feel the list comprehension or array mapping could be an answer to my question ...

please, use `timeit` stdlib module or some other profiling tool, `time.time` with execution in single scope is not an appropriate one — Azat Ibrakov, Apr 05 '19 at 15:04
@AzatIbrakov, ok. thank you, I'll have a look at this function. — Gооd_Mаn, Apr 05 '19 at 15:05
@AzatIbrakov, as suggested replaced time.time with timeit, thank you! — Gооd_Mаn, Apr 05 '19 at 15:20

score 1 · Accepted Answer · answered Apr 05 '19 at 15:45

If you need your data as list, there is not much to gain - list.extend and __iadd__ are very close in performance - depending on the amounts you use one or the other is fastest:

import timeit 
from itertools import repeat , chain 
raw_data = [] 
added_data = range(100000) # verify data : uncomment: range(5)

def iadd():
    raw_data = [] 
    for i in range(10):
        raw_data.__iadd__(added_data)
    # print(raw_data)

def extend():
    raw_data = [] 
    for i in range(10):
        raw_data.extend(added_data)
    # print(raw_data)

def tricked():
    raw_data = list(chain.from_iterable(repeat(added_data,10)))
    # print(raw_data)

for w,c in (("__iadd__",iadd),("  extend",extend),(" tricked",tricked)):
    print(w,end = " : ")
    print("{:08.8f}".format(timeit.timeit(c, number = 200)))

Output:

# number = 20
__iadd__ : 0.69766775
  extend : 0.69303196    # "fastest"
 tricked : 0.74638002


# number = 200
__iadd__ : 6.94286992    # "fastest"
  extend : 6.96098415
 tricked : 7.46355973

If you do not need the things, you might be better off using a generator that chain.from_iterable(repeat(added_data,10)) without creating the list itself to reduce the amount of memory used.

Martijn Pieters♦ answer

Oli · Answer 2 · 2019-04-05T16:40:26.327

I'm unsure if there is a better way to do this, but using numpy and ctypes, you can preallocate enough memory for the entire array, then use ctypes.memmove to copy data into raw_data - which is now a ctypes array of ctypes.c_longs.

from timeit import timeit
import ctypes
import numpy

def test_iadd():
    raw_data = []
    added_data = range(1000000)

    for i in range(10):
        raw_data.__iadd__(added_data)


def test_extend():
    raw_data = []
    added_data = range(1000000)

    for i in range(10):
        raw_data.extend(added_data)
    return


def test_memmove():
    added_data = numpy.arange(1000000)  # numpy equivalent of range

    raw_data = (ctypes.c_long * (len(added_data) * 10))()  # make a ctypes array to contain elements

    # the address to copy to
    raw_data_addr = ctypes.addressof(raw_data)
    # the length of added_data in bytes
    added_data_len = len(added_data) * ctypes.sizeof(ctypes.c_long)
    for i in range(10):
        # copy data for one section
        ctypes.memmove(raw_data_addr, added_data.ctypes.data, added_data_len)
        # update address to copy to
        raw_data_addr += added_data_len


tests = [test_iadd, test_extend, test_memmove]

for test in tests:
    print '{} {}'.format(test.__name__, timeit(test, number=5))

This code produced the following results on my PC:

test_iadd 0.648954868317
test_extend 0.640357971191
test_memmove 0.201567173004

This appears to show that using ctypes.memmove is significantly faster.

Alternatives/Faster ways to list.extend in python?

2 Answers2