2

I have a simple for loop in a Python script:

for filename in filenames:
    outline = getinfo(filename)
    outfile.write(outline)

This for loop is part of a larger script that extracts data from HTML pages. I have nearly 6GB of HTML pages and want to do some test runs before I try it on all of them.

How can I make the loop break after a set number of iterations (lets say 100)?

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
user2475523
  • 43
  • 2
  • 4

4 Answers4

12
for filename in filenames[:100]:
    outline= getinfo(filename)
    outfile.write(outline)

The list slice filenames[:100] will truncate the list of file names to just the first 100 elements.

kqr
  • 14,791
  • 3
  • 41
  • 72
9

Keep a counter for your for loop. When your counter reaches, 100, break

counter = 0
for filename in filenames:
    if counter == 100:
        break
    outline= getinfo(filename)
    outfile.write(outline)
    counter += 1
waldol1
  • 1,841
  • 2
  • 18
  • 22
  • 7
    The preferred way to keep a counter is to do `for (counter, filename) in enumerate(filenames)`. – kqr Jun 11 '13 at 17:19
2

I like @kqr's answer, but just another approach to consider, instead of taking the first 100, you could take a random n many instead:

from random import sample
for filename in sample(filenames, 10):
    # pass
Jon Clements
  • 138,671
  • 33
  • 247
  • 280
  • I would argue this is the better solution as long as it doesn't have any terrible performance issues. – kqr Jun 11 '13 at 17:28
  • @kqr the main issue I would be worried about is reproduce-ability.... So maybe the compromise is to take 1 in n instead, which could be done nicely with slicing as shown in your answer... And still be more useful for testing... – Jon Clements Jun 11 '13 at 17:30
  • Yes, that's what I thought too, but discarded since it would probably have similar performance to taking a random sample. I didn't think about testability, but you are indeed correct. – kqr Jun 11 '13 at 17:34
1

Use the built-in function enumerate(), available in both Python 2 and 3.

for idx,filename in enumerate(filenames):
    if idx == 100:
        break
    outline= getinfo(filename)
    outfile.write(outline)

Also look at this.