2

The slicing operation like some_list[i:i+k] takes O(k) time as it has to iterate from i to i+k when creating the new list as per this.

I am confused about the time complexity of the list slicing operation like chunk_1, chunk_2 = some_list[:chunk_size], some_list[chunk_size:].

Correspondingly what should be overall time complexity of this kind of operation:

while some_list:
    chunk, some_list = some_list[:chunk_size], some_list[chunk_size:]

I suppose, in this operation, the cost of copying the elements into new chunks will also add to the overall cost.

Is there a better way for breaking a large list in evenly sized chunks?

Update:

Did some Profiling to check whether the while loop is O(n^2) operation. Adding the results:

In [1]: def chunk(l, c):
   ...:     while l:
   ...:         l_c, l = l[:c], l[c:]
   ...:

In [2]: l = list(range(1000))

In [3]: %timeit chunk(l, 10)
134 µs ± 702 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [4]: l = list(range(10000))

In [5]: %timeit chunk(l, 10)
16.1 ms ± 99.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [6]: l = list(range(100000))

In [7]: %timeit chunk(l, 10)
1.91 s ± 14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In terms of time complexity, can someone suggest a better way? The data in the list is not numerical so can't use Numpy.

sprksh
  • 2,204
  • 2
  • 26
  • 43
  • `chunk_1, chunk_2 = some_list[:chunk_size], some_list[chunk_size:]` is not one slicing operation but two, each taking time O(k). – DYZ Sep 01 '20 at 05:11
  • 1
    To add to @DYZ, each operation has a different time complexity as k is different, `O(k_1)` and `O(k_2)` where `k_1` + `k_2` is the total length of the original list. By the way, the loop you wrote is wrong, `some_list` and `chunk` should be reversed. – Adirio Sep 01 '20 at 05:15
  • Maybe creating an iterator like `some_iter = iter(some_list)` and then use [`islice`](https://docs.python.org/3/library/itertools.html#itertools.islice)? – sabiwara Sep 01 '20 at 05:23
  • @Adirio, I fixed the mistake. Also, are you suggesting that the overall cost would be O(n^2). I am not sure if that's right. I did some profiling and the results suggest otherwise. Can you add some empirical evidence? – sprksh Sep 01 '20 at 05:32
  • 1
    No, the time complexity is `O(k_1) + O(k_2)` which should be close to `O(n)` not to `O(n^2)` – Adirio Sep 01 '20 at 05:36
  • 1
    @Adirio but that will be time complexity of each step. So the complexity by your logic should be O(n*n/chunk_size) ~= O(n^2). I am talking about the while loop. – sprksh Sep 01 '20 at 05:38
  • If you are working on numerical data, it is probably more efficient to convert your lists into numpy ndarrays and use `numpy.split` or reshaping. – Jan Christoph Terasa Sep 01 '20 at 05:41
  • Oh, do you mean the time complexity of the whole loop? Yeah your numbers seem right. – Adirio Sep 01 '20 at 05:41
  • @JanChristophTerasa I was also thinking on some sort of `deque` that allows faster access from the front as an alternative. – Adirio Sep 01 '20 at 05:43

1 Answers1

1

You can use generator. generator will be much more efficient as it will yield the chunks:

def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]

see the original answer here How do you split a list into evenly sized chunks?

Tasnuva Leeya
  • 2,515
  • 1
  • 13
  • 21
  • Doesn't even address the real question. I am concerned about the complexity of operations here. – sprksh Sep 01 '20 at 05:26
  • I have answered this part of your question `"Is there a better way for breaking a large list in evenly sized chunks?"` And of-course generator will decrease both time and memory complexity. – Tasnuva Leeya Sep 01 '20 at 05:29
  • 2
    This will not decrease the time complexity.. only the memory. – flakes Sep 01 '20 at 05:35