1
def read_large_file(file_handler, block_size=10000):
    block = []
    for line in file_handler:
        block.append(line)
        if len(block) == block_size:
            yield block
            block = []

    # don't forget to yield the last block
    if block:
        yield block

with open(path) as file_handler:
    for block in read_large_file(file_handler):
        print(block)

I am reading this piece of code above written by another. For this line:

if len(block) == block_size:
   yield block
   block = []

Does the block=[] have a chance to be executed? I had thought yield is like a return statement. Also, why is there an if block checking?

Mureinik
  • 297,002
  • 52
  • 306
  • 350
ling
  • 1,555
  • 3
  • 18
  • 24
  • Yes. You can say `yield` is a way to pause execution and resume it again from the next line. Place a print statement to check if it gets executed. – Diptangsu Goswami Feb 10 '20 at 07:29
  • 1
    By adding yield to your function, the function becomes a generator function. For details you can check [here](https://stackoverflow.com/questions/231767/what-does-the-yield-keyword-do) – abc Feb 10 '20 at 07:30
  • BTW, the function will only yield if the size is exactly `block_size`. It might be better to use `if len(block) >= block_size:`. – Matthias Feb 10 '20 at 07:43
  • Does https://stackoverflow.com/questions/231767/what-does-the-yield-keyword-do answer the question? – Karl Knechtel Feb 10 '20 at 07:50

2 Answers2

1

yes, it will be executed when the function resumes on the next iteration. Remember, yield is like a pause button for a generator, and generators are usually used within a loop. The yield is sort of returning a value (i say "sort of", because yield is not the same as return), but when the generator is next accessed, it will pick up at that same spot. The purpose of block = [] is to reset the block to an empty list before the next go around (it might be faster to use block.clear() instead).

This code is building up blocks from a file, and handing them back to the caller as soon as they are sufficiently large. The last if block is to return the last bit, if there is some leftover that didn't fit in a complete block.

Z4-tier
  • 7,287
  • 3
  • 26
  • 42
1

yield produces the next output of the generator and then allows it to continue generating values.

Here, lines are read in to a block (a list of lines). Whenever a block is populated with enough lines it's yielded as the next value from the generator, and then the block is re-initialized to an empty list, and the reading can continue.

Mureinik
  • 297,002
  • 52
  • 306
  • 350