Read a large file, line by line, while also writing to a different large file, in Python

Question

I am trying to read a large file line by line while also writing to a large file and I want to know the "best" way of doing so.

I found this Stack Overflow post for reading a large file line by line and want to know the proper way to also incorporate writing to a file. Is there anything better than nesting a 2nd with open

What I currently have:

 #args is parsed from the command line
 #file is an exogenous variable
 with open(args.inPath + file, "r") as fpIn:
   with open(args.outPath + file, "w") as fpOut:
     for line in fpIn:
       if re.match(some match): canWrite = True
       if re.match(some match 2): break
       if canWrite: fpOut.write(line)

I'm assuming you may way to append to the output file, otherwise you will overwrite it every line — Chris, May 31 '19 at 17:12
The file is only opened once so it doesn't overwrite. It is just calling multiple write statements to the same file. — noah, May 31 '19 at 17:24

score 4 · Accepted Answer · answered May 31 '19 at 17:11

You need not nest the with statements. A single with statement can use multiple context managers.

with open(args.inPath + file, "r") as fpIn, open(args.outPath + file, "w") as fpOut:
    for line in fpIn:
       if re.match(some match): canWrite = True
       if re.match(some match 2): break
       if canWrite: fpOut.write(line)

It's a bit cleaner.

score 0 · Answer 2 · answered May 31 '19 at 17:15

yield is your best friend: via Lazy Method for Reading Big File in Python?

def read_in_chunks(file_object, chunk_size=1024):
  """Lazy function (generator) to read a file piece by piece.
  Default chunk size: 1k."""
  while True:
    data = file_object.read(chunk_size)
    if not data:
      break
    yield data

f = open(args.inPath + file, "r")

with open(args.outPath + file, "a") as fpOut:
  for chunk in read_in_chunks(f):
    if re.match(some match): canWrite = True
    if re.match(some match 2): break
    if canWrite: fpOut.write(chunk)

See also: https://www.pythoncentral.io/python-generators-and-yield-keyword/ , https://www.geeksforgeeks.org/use-yield-keyword-instead-return-keyword-python/

This will be much lighter on your memory footprint as well.

OP is trying to read the file line by line - which is already implemented as an iterator in the file object. This answer is needlessly complicated. — rdas, May 31 '19 at 17:21

Read a large file, line by line, while also writing to a different large file, in Python

2 Answers2