1

Suppose I have a text file that goes like this:

AAAAAAAAAAAAAAAAAAAAA              #<--- line 1
BBBBBBBBBBBBBBBBBBBBB              #<--- line 2
CCCCCCCCCCCCCCCCCCCCC              #<--- line 3
DDDDDDDDDDDDDDDDDDDDD              #<--- line 4
EEEEEEEEEEEEEEEEEEEEE              #<--- line 5
FFFFFFFFFFFFFFFFFFFFF              #<--- line 6
GGGGGGGGGGGGGGGGGGGGG              #<--- line 7
HHHHHHHHHHHHHHHHHHHHH              #<--- line 8


Ignore "#<--- line...", it's just for demonstration


Assumptions

  • I don't know what line 3 is going to contain (because it changes all the time)...
  • The first 2 lines have to be deleted...
  • After the first 2 lines, I want to keep 3 lines...
  • Then, I want to delete all lines after the 3rd line.


End Result
The end result should look like this:

CCCCCCCCCCCCCCCCCCCCC              #<--- line 3
DDDDDDDDDDDDDDDDDDDDD              #<--- line 4
EEEEEEEEEEEEEEEEEEEEE              #<--- line 5


Lines deleted: First 2 + Everything after the next 3 (i.e. after line 5)

Required
All Pythonic suggestions are welcome! Thanks!




Reference Material
https://thispointer.com/python-how-to-delete-specific-lines-in-a-file-in-a-memory-efficient-way/

def delete_multiple_lines(original_file, line_numbers):
    """In a file, delete the lines at line number in given list"""
    is_skipped = False
    counter = 0
    # Create name of dummy / temporary file
    dummy_file = original_file + '.bak'
    # Open original file in read only mode and dummy file in write mode
    with open(original_file, 'r') as read_obj, open(dummy_file, 'w') as write_obj:
        # Line by line copy data from original file to dummy file
        for line in read_obj:
            # If current line number exist in list then skip copying that line
            if counter not in line_numbers:
                write_obj.write(line)
            else:
                is_skipped = True
            counter += 1

    # If any line is skipped then rename dummy file as original file
    if is_skipped:
        os.remove(original_file)
        os.rename(dummy_file, original_file)
    else:
        os.remove(dummy_file)


Then...

delete_multiple_lines('sample.txt', [0,1,2])


The problem with this method might be that, if your file had 1-100 lines on top to delete, you'll have to specify [0,1,2...100]. Right?


Answer
Courtesy of @sandes

The following code will:

  • delete the first 63
  • get you the next 95
  • ignore the rest
  • create a new file


with open("sample.txt", "r") as f:
    lines = f.readlines()
    new_lines = []
    idx_lines_wanted = [x for x in range(63,((63*2)+95))]
    # delete first 63, then get the next 95
    for i, line in enumerate(lines):
        if i > len(idx_lines_wanted) -1:
            break
        if i in idx_lines_wanted:
             new_lines.append(line)

with open("sample2.txt", "w") as f:
    for line in new_lines:
        f.write(line)
  • 1
    Don't just post a problem and ask people to write code for you. Lets see yours. (or search first in case it's already answered) – Kenny Ostrom Mar 17 '20 at 20:16
  • Does this answer your question? [using Python for deleting a specific line in a file](https://stackoverflow.com/questions/4710067/using-python-for-deleting-a-specific-line-in-a-file) – Kenny Ostrom Mar 17 '20 at 20:18
  • Hi Kenny, thanks for your response! I only have the code for extracting the data (which doesn't apply here). Have visited the post before, but unfortunately, I don't have a specific "nickname" or "item" for reference, only the number of the lines to keep/delete. I'm really new to this, and any suggestion would help greatly :) – shongyang low Mar 17 '20 at 20:22

2 Answers2

3

EDIT: iterating directly over f

based in @Kenny's comment and @chepner's suggestion

with open("your_file.txt", "r") as f:
    new_lines = []
    for idx, line in enumerate(f):
        if idx in [x for x in range(2,5)]: #[2,3,4]
            new_lines.append(line)

with open("your_new_file.txt", "w") as f:
    for line in new_lines:
        f.write(line)
sandes
  • 1,917
  • 17
  • 28
  • Hi sandes, thank you! To clarify, if we needed more than just 3 lines, e.g. 100/200 lines (therefore, 2-102/202)... Do we still need to [2,3,4,5...102/202]? – shongyang low Mar 17 '20 at 20:44
  • One improvement: there is no need to read the entire file into memory; you can iterate directly over `f` itself, and write each line to the output as you go, rather than accumulating lines to keep in memory. – chepner Mar 17 '20 at 21:42
  • it's probably more common to write to a new file, and then when that has succeeded, you delete the original file and rename the new file to the original file. – Kenny Ostrom Mar 17 '20 at 23:24
2

This is really something that's better handled by an actual text editor.

import subprocess

subprocess.run(['ed', original_file], input=b'1,2d\n+3,$d\nwq\n')

A crash course in ed, the POSIX standard text editor.

ed opens the file named by its argument. It then proceeds to read commands from its standard input. Each command is a single character, with some commands taking one or two "addresses" to indicate which lines to operate on.

After each command, the "current" line number is set to the line last affected by a command. This is used with relative addresses, as we'll see in a moment.

  • 1,2d means to delete lines 1 through 2; the current line is set to 2
  • +3,$d deletes all the lines from line 5 (current line is 2, so 2 + 3 == 5) through the end of the file ($ is a special address indicating the last line of the file)
  • wq writes all changes to disk and quits the editor.
chepner
  • 497,756
  • 71
  • 530
  • 681
  • Hi chepner, thank you! Reading about "subprocess" for the first time is very interesting! Basically, if I understand the code: [1] 'original_file' should be the path (i.e. sample.txt)? ... [2] '1,2... is the first 2 lines you'd want to delete... [3] n+3 is the ones you'd want to keep... [4]\nwq\n' is to delete the lines at the end? Correct me if I'm wrong please, tq. – shongyang low Mar 17 '20 at 20:58
  • Awesome content @chepner! That said, I just gave it a shot with this: `import subprocess original_file = "C:\\Users\\USER\\Desktop\\Sample\\your_file.txt" subprocess.run(['ed', original_file], input=b'1,2d\n+3,$d\nwq\n')` Somehow it returned with `"FileNotFoundError: [WinError 2] The system cannot find the file specified"` – shongyang low Mar 17 '20 at 21:20
  • 1
    Ah, unfortunately, `ed` does not ship with Windows (it's a POSIX utility, after all). I am not sure if there is a comparable editor available, or if installing `ed` would be an option for you. – chepner Mar 17 '20 at 21:40
  • sed or awk should be available on a bash shell, if you program a lot – Kenny Ostrom Mar 17 '20 at 23:24