1

I want to loop in sublists. I achieve it by doing the following code.

def batchGenerator(samples, subsetSize):
    i=0
    while (i < (len(samples) - subsetSize + 1)):
        yield samples[i: i + subsetSize]
        i = i + subsetSize

Is there a more standard library function to do the same thing?

I want to use it like:

for subl in batchGenerator(range(100), 10):
    print (max(subl))

Output:

9
19
29
39
49
59
69
79
89
99

Edit:

I want the trailing elements that are fewer than subsetSize to be truncated, and I find @s3cur3 solution the most elegant for this case (compared to the solutions in a similar thread: What is the most "pythonic" way to iterate over a list in chunks?)

I also prefer that the output stays the same type, list, numpy.array, torch.Tensor, etc

user7867665
  • 852
  • 7
  • 25
  • Possible duplicate of [What is the most "pythonic" way to iterate over a list in chunks?](https://stackoverflow.com/questions/434287/what-is-the-most-pythonic-way-to-iterate-over-a-list-in-chunks) – Jacob Tomlinson Nov 21 '18 at 15:17
  • Yes its very similar but the solution wouldn't work for me because I want to truncate the last elements that are fewer than `subsetSize` – user7867665 Nov 21 '18 at 15:40

1 Answers1

2

How about:

def batchGenerator(samples, subsetSize):
    return (samples[i:i+subsetSize] for i  in range(0, len(samples), subsetSize))

The range() call here lets you iterate up to the length of your list, jumping subsetSize at a time (thus giving you an i of 0, 10, 20, . . ., 90 in your example).

Edited to respond to comment:

If you want to allow the input to be a list-of-lists, you'd need to use generator syntax like this:

def batchGenerator(listOfSampleLists, subsetSize):
    for sampleList in listOfSampleLists:
        for i in range(0, len(sampleList), subsetSize):
            yield sampleList[i:i+subsetSize]
s3cur3
  • 2,749
  • 2
  • 27
  • 42
  • Yes this works too. Will it be less memory efficient without `yield`? – user7867665 Nov 21 '18 at 15:41
  • I also prefer my function because it works more generally for `numpy` array, PyTorch `Tensor` without changing the type to a `list` – user7867665 Nov 21 '18 at 15:44
  • 1
    Ah, right you are! I've edited the answer to return a generator instead of a list. That should be compatible with the other (non-`list`) types you mentioned, I believe. – s3cur3 Nov 21 '18 at 15:47
  • This is very nice! Considering there is no existing library function to already, this solution looks the most elegant. I will mark it as correct answer – user7867665 Nov 21 '18 at 15:57
  • How can we generalise this for multiple inputs? Like batchGenerator((samples_features, samples_label), subsetSize) – user7867665 Nov 27 '18 at 17:12
  • thanks, I modified your code for the functionality I wanted, and suggested an edit. – user7867665 Nov 28 '18 at 12:57