0

I want to remove empty lines in a large text file with Python 3 and I have a one-line which works fine most of the time but not for large files:

open('path_to/dest_file.csv', 'w').write(re.sub('\n\s*\n+', '\n', open('path_to/source_file.csv').read()))

sometimes results in a

Traceback (most recent call last):
File "/scripts/dwh_common.py", line 261, in merge_files
    open(out_path, 'w').write(re.sub('\n\s*\n+', '\n', open(tmp_path).read()))
File "/usr/lib64/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
MemoryError

I am aware that I could use sed instead but I want to avoid calls to OS executables if possible. I am also aware that I could split the file to be processed or increase the main memory, which I'm trying to avoid too. Does someone have an idea to solve that more memory efficient in Python?

Edit: What makes this question different from others is not that I want to know how to delete all blank lines in a file with the help of python but how to do this more memory efficient.

Answer: As pointed out by @not_a_robot and @bruno desthuilliers reading line by line instead of reading the whole file into memory solved the issue. Used the answer from this question:

with open(tmp_path) as f, open(out_path, 'w') as outfile:
    for line in f.readlines():
        if not line.strip():
            continue
        if line:
            outfile.write(line)
Community
  • 1
  • 1
stack_lech
  • 990
  • 2
  • 10
  • 28
  • 5
    Why not just read line-by-line, check whether each line is not empty, and keep only those lines that aren't empty? You don't *need* to read the entire file in at once. Something like `with open('file.csv') as f: for line in f: if line != ''` blah, blah, blah... – blacksite Apr 20 '17 at 12:58
  • The memory-efficient way to work with files is very obviously to NOT read whole files in memory. – bruno desthuilliers Apr 20 '17 at 13:29
  • Working it off line by line instead of loading the whole file into memory solved the probem. – stack_lech Apr 20 '17 at 13:54

0 Answers0