2

It is interesting that a question like this is not asked before in SO.

I am recording lines of data in to a text file in Python2. What I would like to do is, by the number of the line, I want to erase a line, but I don't want it filled by the next one, just stay empty (Therefore not having to write a new file each time I erase a line.)

So what I am asking is not one of these,

Basic concept is to change contents of a specific line, which in this case changed with an empty string.

There is a question which I did not truly understand, but could contain an answer for my question. If it is such, please help me understand how so.

If you think my question is a duplicate of this one, please explain the answer to me, before flagging he question.

My research on the subject:

Edit: I even forgot to ask if such a thing is feasible, I would appreciate your information.

Community
  • 1
  • 1
Rockybilly
  • 2,938
  • 1
  • 13
  • 38
  • 1
    Would replacing the specific line with something like whitespace characters suffice? Otherwise you would have to translate backwards all of the bytes after the line in question. – fuglede Nov 06 '16 at 09:49
  • 1
    @fuglede I guess whitespace characters would suffice, however I now realize that for the behavior I am after, fixed byte length is needed. Just like in C. But I can check the length of each line and replace them with enough number of spaces. I still don't know how to accomplish that in Python though. – Rockybilly Nov 06 '16 at 09:55
  • As fuglede said, you _could_ replace the unwanted bytes with a whitespace byte, eg the space character (ASCII code 0x20). Traditionally, the [DEL character](https://en.wikipedia.org/wiki/Delete_character) (ASCII code 0x7f) has been used for this purpose. – PM 2Ring Nov 06 '16 at 09:58
  • I added a way to achieve that with Python in the answer below. – fuglede Nov 06 '16 at 10:05
  • Are the lines in your text file all the same length? If so, you can use the file `.seek` method to quickly jump to any desired line. – PM 2Ring Nov 06 '16 at 10:14
  • Another option here is to use a [memory-mapped file](https://docs.python.org/3/library/mmap.html). – PM 2Ring Nov 06 '16 at 10:35

4 Answers4

2

Here's a function that modifies a text file in-place, replacing the specified line by a line of the same length.

In this demo I use # as the replacement character to make it easier to see what's going on. You could use a simple space (chr(32)) instead, or the ASCII DEL character (chr(127) == \x7f). A benefit of using DEL is that it makes it a little easier to rapidly delete all of these "erased" lines because that character won't occur in any of the file's "proper" lines.

Firstly, here's a small text file to test this code with.

qdata

1 one
2 two
3 three
4 four
5 five
6 six
7 seven
8 eight
9 nine

Here's the code. Note that it uses 1-based line numbering.

def erase_line(fname, line_num):
    ''' In-place replacement of line `line_num` in file `fname` with
        a line of DEL chars of the same length, retaining the newline.
    '''
    DEL = '#'
    with open(fname, 'r+') as f:
        for i in range(line_num - 1):
            f.readline()
        start = f.tell()
        line = f.readline()
        line = DEL * (len(line) - 1) + '\n'
        f.seek(start)
        f.write(line)

erase_line('qdata', 3)

Here's the modified version of qdata:

1 one
2 two
#######
4 four
5 five
6 six
7 seven
8 eight
9 nine

Because it has to deal with lines of varying lengths, erase_line has to read all of the lines until it finds the desired one, but it only re-writes that line, it doesn't modify any other lines, so it should be fairly quick. If your lines were of fixed length we could use .skip to immediately jump to the desired line.


Here's a function that will strip any lines that consist entirely of the DEL character, writing the result to a new file.

def compact(oldname, newname):
    ''' Copy file `oldname` to `newname`, removing lines that
        consist entirely of the DEL char, apart from the '\n'
    '''
    DEL = '#'
    with open(oldname, 'r') as fin, open(newname, 'w') as fout:
        for line in fin:
            if not line.lstrip(DEL) == '\n':
                fout.write(line)

compact('qdata', 'qdata.new')

qdata.new

1 one
2 two
4 four
5 five
6 six
7 seven
8 eight
9 nine

Finally, here's a Unix / Linux pipeline that performs the compacting operation, assuming you're using the actual DEL character (which is \177 in octal). It's probably faster than my Python version.

tr -d '\177' <qdata | awk '!/^$/' >qdata.new
PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
0

Is something like this what you're after?

def remove_line_from_file(filename, line_number):
    with open(filename) as f:
        lines = f.readlines()
    lines[line_number - 1] = '\n'  # <- or whatever kind of newline is relevant for your system
    with open(filename, 'w') as f:
        f.writelines(lines)

Then, if the contents of the file test are

line 1
line 2
line 3

running remove_line_from_file('test', 2) will turn test into

line 1

line 3

Update, now that I actually read the question properly: This method modifies the file in place, replacing the contents of the line with whitespace characters:

def remove_line_from_file(filename, line_number):
    with open(filename, 'r+') as f:
        count = 0
        bytes_read = 0
        while True:
            bytes_read += 1
            this_byte = f.read(1)
            if not this_byte:
                break
            if this_byte == '\n':
                count += 1
                if count == line_number - 1:
                    start = bytes_read
                elif count == line_number:
                    f.seek(start)
                    f.write(' ' * (bytes_read - start - 1))
                    break

Going by PM 2Ring's comment above, it would also make sense to use chr(127) instead of ' '.

fuglede
  • 17,388
  • 2
  • 54
  • 99
  • 1
    Thank you for your answer. Your approach would work, however this is exactly what I am trying to avoid. Reading all data and writing it all together. – Rockybilly Nov 06 '16 at 09:41
  • Ah, sorry, I missed that part of your question. Indeed, `file.seek` should be what you're after then. https://stackoverflow.com/questions/1877999/delete-final-line-in-file-via-python/10289740#10289740 does something along those lines (but for the final line only, which means that it can not be adopted immediately). – fuglede Nov 06 '16 at 09:42
  • The benefit of using the DEL char is that you can rapidly "compact" the file, i.e., copy the file, removing all the DELs, either with a Python script, or with the standard *nix `tr` utility. – PM 2Ring Nov 06 '16 at 10:19
0

You're right, the fileinput module is exactly what you need:

import fileinput
def blank_line(filename, lineno):
    f = fileinput.input(files=[filename], inplace=True)
    for line in f:
        if fileinput.lineno() == lineno: # note: line numbers start at 1, not 0
            line = ""
        print line.rstrip("\n") # Output is redirected to the current line of the file
    f.close()

Note that Python 3 has a few advantages here: fileinput supports context managers (with statements), and the new print() function allows us to preserve lines exactly as they are (instead of always adding either a newline or a space at the end).

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • But the OP says they don't want to "write a new file each time I erase a line". They only want to modify the single line that they want to erase. – PM 2Ring Nov 06 '16 at 10:32
  • I'd like to save the CPU cycles rather than memory. I will clean the file as the lines increase after some point of course. – Rockybilly Nov 06 '16 at 10:37
0

You should understand how text files files on most systems are stored on disk or other storage media.

While the details are different between different systems more or less all of them today have the concepts of fixed-sized "blocks". Files are allocated in those blocks and a text file is just a sequence of characters in which some are 0x0A newline codes(*).

Let's say for example that a block is 32 bytes (they're normally bigger than that, but just to make diagrams easier to read).

 _______text file logical content________
|Hello, world¶                           |
|This is a text file that contains¶      |
|three lines¶____________________________|

 _______________________a 32 bytes block______________________
|_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _|
|H|e|l|l|o|,| |w|o|r|l|d|.|¶|T|h|i|s| |i|s| |a| |t|e|x|t| |f|i|
|l|e| |t|h|a|t| |c|o|n|t|a|i|n|s|¶|t|h|r|e|e| |l|i|n|e|s|¶|_|_|

As you see the three lines are taking up two blocks and the last two bytes of second blocks are unused.

The file system will take care of not showing you those extra two bytes but the point is that the "lines" of the text file have nothing to do with the structure of the file on the disk: all the lines are written contiguously one after another with special new-line characters between them(**).

If for example you want to replace a line with another of the same exact length you could just update those few bytes. If instead the line is of a different length or if you want to delete or insert a new line the only solution is to actually rewrite the whole file from that point to the end.

(*) Little digression: MS-DOS used long ago and henceforth Windows uses today two characters 0x0D+0x0A for marking newlines because... well... no one knows for sure: it's a stupid stupid stupid inexcusable-even-back-then choice with no real good reasons we'll all have to live forever with. This mistake of having two newline characters is at the base of the "binary-mode" madness.

(**) Second digression: There are even today very "common" file systems where text files have fixed-length lines instead of using line termination characters, but they're use only to store bank accounts, insurance policies and other absolutely vital information that is continuously shuffled by COBOL programs of which the source code was lost long ago and of which no one ever maintained any serious repository anyway. If this scares you then just ignore them and keep all your money under the mattress.

Community
  • 1
  • 1
6502
  • 112,025
  • 15
  • 165
  • 265
  • Maybe what I need is a linked list equivelant of Python(but still stored in hard disk, in a file). Because the structure I am going to use is exactly a deque. I wonder if there is such a datatype in operating systems. – Rockybilly Nov 06 '16 at 11:46