7

Given a function process_list that takes a list of unique IDs and sends the list to an API endpoint for processing. The limit for the list is 100 elements at a time.

If I have a list that is more than 100 elements, how do I process the first 100, then the next 100, until I reach n?

my_list = [232, 231, 932, 233, ... n]
# first 100
process_list(my_list[:100])


def process_list(my_list):
    url = 'https://api.example.com'
    data = {'update_list': my_list}
    headers = {'auth': auth}
    r = requests.put(url, data=json.dumps(data), headers=headers)
bayman
  • 1,579
  • 5
  • 23
  • 46
  • This has been answered many different times. My favourite is: https://stackoverflow.com/questions/434287/how-to-iterate-over-a-list-in-chunks – otocan Feb 16 '23 at 15:04

3 Answers3

18

Trying to keep it simple because I assume you are starting with Python

Iterate the list increasing a hundred every iteration

# builds a list of numbers from 0 thru 10122
my_list = [i for i in range(10123)]

# i will step through the indexes (not the items!) in the list
# 100 at a time,
for i in range(0, len(my_list), 100):
    # call our helper to process a sub list
    process_list(my_list[i:i+100])

# helper to process a sub list
def process_list(my_list):       
    url = 'https://api.example.com'
    data = {'update_list': my_list}
    headers = {'auth': auth}
    r = requests.put(url, data=json.dumps(data), headers=headers)

You have two options on how to use range from the docs:

range(start, stop[, step])

or

range(stop)

Using the first option you iterate through the sequence 0, 100, 200, ...

MikeB
  • 1,452
  • 14
  • 28
Vitor Falcão
  • 1,007
  • 1
  • 7
  • 18
  • Keeping it simple (as this answer does) is _always_ a good idea, regardless of the questioner's experience level ;) – Will Dec 12 '22 at 17:43
4

Here is a recipe from the itertools docs that should may help:

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

Use it like this:

def process_list(my_list):
    url = 'https://api.example.com'
    for group in grouper(mylist):
        data = {'update_list': list(group)}
        headers = {'auth': auth}
        r = requests.put(url, data=json.dumps(data), headers=headers)

The call iter(iterable) creates an iterator object to walk over the list and [iter(iterable)] * n duplicates this single iterator object n times.

In the call zip_longest(*args, fillvalue=fillvalue), the *args causes each iterator to be considered a separate argument of zip. Thus the output of all the iterators will be zipped together.

As each element of zipped output is produced, it calls each of the iterators (which are all the same iterator) once, thus removing n total items from the iterator. Using _longest with a fillvalue specifies what should be filled.

Josiah Yoder
  • 3,321
  • 4
  • 40
  • 58
Raymond Hettinger
  • 216,523
  • 63
  • 388
  • 485
  • 1
    I'm stumped. What in `[iter(iterable)] * n` causes each iterator to skip `n` elements? – DeepSpace Jun 09 '19 at 21:35
  • 2
    Can you please explain the solution? – qwerty Jun 13 '19 at 20:38
  • @DeepSpace I've explained it. Questions welcome. I still think the old slicing approach may be best if everything fits in memory. – Josiah Yoder Jun 22 '23 at 19:04
  • I think of iterators as being functional because they feel similar to a streaming framework that allows filtering, mapping, etc. But this example is certainly *not* functional. Iterating through a list of iterators most certainly has side-effects and this code relies on the behavior that all iterators are visited in each step of `zip_longest` before it creates its next element of output. I'm not sure how I feel about that, but I am using this code. – Josiah Yoder Jun 22 '23 at 19:16
1

you could also use

for i in range((len(my_list)//100)+1):
    process_list(my_list[i*100:(1+i)*100])
Mig B
  • 637
  • 1
  • 11
  • 19
  • If the length of `my_list` is a multiple of 100, and empty list will be passed to `process_list` on the final iteration. Suggest `range((99 + len(my)list)) // 100)`, although the accepted answer is simpler. – Deepstop Dec 25 '21 at 15:53