1

I have a list of dicts I need to be divided into chunks:

input_size = len(input_rows)  # num of dicts
slice_size = int(input_size / 4)  # size of each chunk
remain = input_size % 4  # num of remaining dicts which cannot be divided into chunks
result = []  # initializes the list for containing lists of dicts
iterator = iter(input_rows)  # gets a iterator on input
for i in range(4):
    result.append([])  # creates an empty list as an element/block in result for containing rows for each core
    for j in range(slice_size):
        result[i].append(iterator.__next__())  # push in rows into the current list
    if remain:
        result[i].append(iterator.__next__())  # push in one remainder row into the current list
        remain -= 1

input_rows contains a list of dicts, divide it into 4 chunks/slices; if there are any remaining dicts that cannot be evenly divided into 4 chunks, these remaining dicts will be put into some of the chunks. A list (result) is used to contain each chunk, which in turn contains a list of dicts.

I am wondering how to do it in a more efficient way.

Tom Zych
  • 13,329
  • 9
  • 36
  • 53
daiyue
  • 7,196
  • 25
  • 82
  • 149
  • Possible duplicate of [Splitting a list into N parts of approximately equal length](https://stackoverflow.com/questions/2130016/splitting-a-list-into-n-parts-of-approximately-equal-length) – Tom Zych Jul 16 '18 at 19:51
  • 1
    [This answer](https://stackoverflow.com/a/2135920/675568) does what you want. – Tom Zych Jul 16 '18 at 19:51

3 Answers3

2

Using the Standard Lib

R = list()
L = list(range(10))

remainder = int(len(L) % 4)
chunk_size = int(len(L) / 4)
position = 0

while position < len(L):
    this_chunk = chunk_size
    if remainder:
        this_chunk += 1
        remainder -= 1
    R.append(L[position:this_chunk + position])
    position += this_chunk

print(R)
[[0, 1, 2], [3, 4, 5], [6, 7], [8, 9]]

This should be quicker as you're iterating and inserting far less. In this case you're literally just grabbing 4 slices and inserting 4 times based on calculations of the lists metadata...

Additionally, This is specifically the reason for numpy.array_split*: This should be faster still...

>>> print(*np.array_split(range(10), 4))
[0 1 2] [3 4 5] [6 7] [8 9]

EDIT: Due to feedback in the comment section and a potential error in the answer above (in cases where the input list size is smaller than the potential number of chunks) here is an alternative function that does the same thing, but will always produce the requested number of chunks

def array_split(input_list, chunks):
    chunk_size = int(len(input_list) / chunks)
    remainder = len(input_list) % chunks
    new_list = list()
    position = 0

    while position < len(input_list):
        this_chunk = chunk_size
        if remainder:
            this_chunk, remainder = this_chunk + 1, remainder - 1
        new_list.append(input_list[position:this_chunk + position])
        position += this_chunk

    new_list.extend([[] for _ in range(chunks - len(new_list))])

    return new_list

def unit_test():
    L = [1, 2]
    print(array_split(L, 4))

    L = list(range(13))
    print(array_split(L, 3))

    L = list(range(22))
    print(array_split(L, 5))

>>> unit_test()
[[1], [2], [], []]
[[0, 1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]
[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9], [10, 11, 12, 13], [14, 15, 16, 17], [18, 19, 20, 21]]
sehafoc
  • 866
  • 6
  • 9
  • Slightly different result from OP’s code if input length is less than 4, e.g., OP’s `[[1], [2], [], []]` vs. your `[[1], [2]]`. Some would prefer the latter result. – Tom Zych Jul 18 '18 at 14:49
  • @TomZych yeah... there are a few holes in the answer that I didn't consider to be in the scope of "how do I make this faster". Ideally rather than a hard coded 4 you'd have this in a function where you specify the number of chunks you want. If the input list was smaller than the requested chunk size your function cold either just append the extra lists, or blow up depending on what the use case for the function is. – sehafoc Jul 18 '18 at 22:51
1
myList = [1,2,3,4,5,6,9]
numOfChunks = 2
newList = []

for i in range(0, len(myList), numOfChunks):
  newList.append(myList[i:i + numOfChunks])

print (newList) # [[1, 2], [3, 4], [5, 6], [9]]
Nash
  • 105
  • 1
  • 9
  • This happens to work correctly if the element count is 7 or 8. For other numbers, it doesn’t, even if you adjust numOfChunks. – Tom Zych Jul 16 '18 at 19:41
  • I didn't get you response, could you give me the example that you're trying. – Nash Jul 16 '18 at 21:14
  • Try it with six elements; you end up with only three chunks. Try it with nine, you get five chunks. Or, nine with `numOfChunks = 3`, you get three chunks. Not what OP is asking for. – Tom Zych Jul 16 '18 at 21:48
1

Looks like this won’t be closed as a duplicate any time soon, so I have adapted tixxit’s great answer from 2010 to this problem, converting his generator to a list comprehension and making things, I hope, easier to understand.

chunks = 4
quot, rem = divmod(len(input_rows), chunks)
divpt = lambda i: i * quot + min(i, rem)
return [input_rows[divpt(i):divpt(i+1)] for i in range(chunks)]

The testing framework below shows that the resulting code generates exactly the same results as OP’s code.

def main():
    for top in range(1, 18):
        print("{}:".format(top))
        input_list = list(range(1, top + 1))

        daiyue = chunkify_daiyue(input_list[:])
        print('daiyue: {}'.format(daiyue))

        zych = chunkify_zych(input_list[:])
        match = 'Same' if (zych == daiyue) else 'Different'
        print('Zych:   {}   {}'.format(zych, match))

        print('')


def chunkify_daiyue(input_rows):
    "Divide into chunks with daiyue's code"

    input_size = len(input_rows)  # num of dicts
    slice_size = int(input_size / 4)  # size of each chunk
    remain = input_size % 4  # num of remaining dicts which cannot be divided into chunks

    result = []  # initializes the list for containing lists of dicts
    iterator = iter(input_rows)  # gets a iterator on input

    for i in range(4):
        # creates an empty list as an element/block in result for
        # containing rows for each core
        result.append([])

        for j in range(slice_size):
            # push in rows into the current list
            result[i].append(iterator.__next__())
        if remain:
            # push in one remainder row into the current list
            result[i].append(iterator.__next__())
            remain -= 1

    return result


def chunkify_zych(input_rows):
    "Divide into chunks with Tom Zych's code"

    chunks = 4
    quot, rem = divmod(len(input_rows), chunks)
    divpt = lambda i: i * quot + min(i, rem)
    return [input_rows[divpt(i):divpt(i+1)] for i in range(chunks)]


if __name__ == '__main__':
    main()
Tom Zych
  • 13,329
  • 9
  • 36
  • 53