Here's a function that modifies a text file in-place, replacing the specified line by a line of the same length.
In this demo I use #
as the replacement character to make it easier to see what's going on. You could use a simple space (chr(32)
) instead, or the ASCII DEL character (chr(127)
== \x7f
). A benefit of using DEL is that it makes it a little easier to rapidly delete all of these "erased" lines because that character won't occur in any of the file's "proper" lines.
Firstly, here's a small text file to test this code with.
qdata
1 one
2 two
3 three
4 four
5 five
6 six
7 seven
8 eight
9 nine
Here's the code. Note that it uses 1-based line numbering.
def erase_line(fname, line_num):
''' In-place replacement of line `line_num` in file `fname` with
a line of DEL chars of the same length, retaining the newline.
'''
DEL = '#'
with open(fname, 'r+') as f:
for i in range(line_num - 1):
f.readline()
start = f.tell()
line = f.readline()
line = DEL * (len(line) - 1) + '\n'
f.seek(start)
f.write(line)
erase_line('qdata', 3)
Here's the modified version of qdata:
1 one
2 two
#######
4 four
5 five
6 six
7 seven
8 eight
9 nine
Because it has to deal with lines of varying lengths, erase_line
has to read all of the lines until it finds the desired one, but it only re-writes that line, it doesn't modify any other lines, so it should be fairly quick. If your lines were of fixed length we could use .skip
to immediately jump to the desired line.
Here's a function that will strip any lines that consist entirely of the DEL character, writing the result to a new file.
def compact(oldname, newname):
''' Copy file `oldname` to `newname`, removing lines that
consist entirely of the DEL char, apart from the '\n'
'''
DEL = '#'
with open(oldname, 'r') as fin, open(newname, 'w') as fout:
for line in fin:
if not line.lstrip(DEL) == '\n':
fout.write(line)
compact('qdata', 'qdata.new')
qdata.new
1 one
2 two
4 four
5 five
6 six
7 seven
8 eight
9 nine
Finally, here's a Unix / Linux pipeline that performs the compacting operation, assuming you're using the actual DEL character (which is \177
in octal). It's probably faster than my Python version.
tr -d '\177' <qdata | awk '!/^$/' >qdata.new