0

I am trying to read a large file line by line while also writing to a large file and I want to know the "best" way of doing so.

I found this Stack Overflow post for reading a large file line by line and want to know the proper way to also incorporate writing to a file. Is there anything better than nesting a 2nd with open

What I currently have:

 #args is parsed from the command line
 #file is an exogenous variable
 with open(args.inPath + file, "r") as fpIn:
   with open(args.outPath + file, "w") as fpOut:
     for line in fpIn:
       if re.match(some match): canWrite = True
       if re.match(some match 2): break
       if canWrite: fpOut.write(line)
noah
  • 2,616
  • 13
  • 27
  • I'm assuming you may way to append to the output file, otherwise you will overwrite it every line – Chris May 31 '19 at 17:12
  • 1
    The file is only opened once so it doesn't overwrite. It is just calling multiple write statements to the same file. – noah May 31 '19 at 17:24

2 Answers2

4

You need not nest the with statements. A single with statement can use multiple context managers.

with open(args.inPath + file, "r") as fpIn, open(args.outPath + file, "w") as fpOut:
    for line in fpIn:
       if re.match(some match): canWrite = True
       if re.match(some match 2): break
       if canWrite: fpOut.write(line)

It's a bit cleaner.

rdas
  • 20,604
  • 6
  • 33
  • 46
0

yield is your best friend: via Lazy Method for Reading Big File in Python?

def read_in_chunks(file_object, chunk_size=1024):
  """Lazy function (generator) to read a file piece by piece.
  Default chunk size: 1k."""
  while True:
    data = file_object.read(chunk_size)
    if not data:
      break
    yield data

f = open(args.inPath + file, "r")

with open(args.outPath + file, "a") as fpOut:
  for chunk in read_in_chunks(f):
    if re.match(some match): canWrite = True
    if re.match(some match 2): break
    if canWrite: fpOut.write(chunk)

See also: https://www.pythoncentral.io/python-generators-and-yield-keyword/ , https://www.geeksforgeeks.org/use-yield-keyword-instead-return-keyword-python/

This will be much lighter on your memory footprint as well.

order
  • 131
  • 7
  • 1
    OP is trying to read the file line by line - which is already implemented as an iterator in the file object. This answer is needlessly complicated. – rdas May 31 '19 at 17:21