I have a large file that is named e.g. XXX_USR.txt. I iterate through the folder and some of the txt files are over 500 MB large. In order to avoid MEMORY ERROR
, I need to append the files line-by-line. However, my current method is way too slow. The first line is appended by |SYS
and all of the other lines are appended by '| ' + amendtext
. amendtext
is a variable that takes the first string from the name of the .txt file before the first underscore e.g. "XXX".
File: XXX_USR.txt
INPUT:
| name | car |
--------------
| Paul |Buick|
|Ringo |WV |
|George|MG |
| John |BMW |
DESIRED OUTPUT:
|SYS | name | car |
--------------------
| XXX | Paul |Buick|
| XXX |Ringo |WV |
| XXX |George|MG |
| XXX | John |BMW |
My code that is way too slow, but beats the memory error.
import os
import glob
from pathlib import Path
cwd = 'C:\\Users\\EricClapton\\'
directory = cwd
txt_files = os.path.join(directory, '*.txt')
for txt_file in glob.glob(txt_files):
cpath =(Path(txt_file).resolve().stem)
nametxt = "-".join(cpath.split('_')[0:1])
amendtext = "| " + nametxt
systext = "| SYS"
with open(txt_file,'r', errors='ignore') as f:
get_all=f.readlines()
with open(txt_file,'w') as f:
for i,line in enumerate(get_all,1):
if i == 1:
f.writelines(systext + line)
else:
f.writelines(amendtext + line)