1

Fastest way to read and delete N lines in python.

First I read the file something like this: (I think this is the best way to read large files: Source)

N = 50
with open("ahref.txt", "r+") as f:
    link_list = [(next(f)).removesuffix("\n") for x in range(N)]

after that I run my code:

# My code here

After that I want to delete the first N line (I read it: Source).

# Source: https://stackoverflow.com/questions/4710067/how-to-delete-a-specific-line-in-a-file/28057753#28057753
with open("target.txt", "r+") as f:
    d = f.readlines()
    f.seek(0)
    for i in d:
        if i != "line you want to remove...":
            f.write(i)
    f.truncate()

This code doesn't work for me. Because I read only N numbers of lines.

I can remove lines:

with open("xml\\ahref.txt", "r+") as f:
    N = 5
    all_lines = f.readlines()
    f.seek(0)
    f.truncate()
    f.writelines(all_lines[N:])

But there is a problem with that:

  1. I have to read all the lines and after that I have to write all the lines. which is not a fast way (There are many ways, but it needs to read all line)

What is the fastest way in terms of performance? because the file is huge.

Kheersagar patel
  • 383
  • 3
  • 15

1 Answers1

1

fastest way is not to read the entire file in memory and use a temporary output file that you can then move over the original file if required

try:

N = 50
mode = "r+"
if not os.path.isfile('output'): mode = "w+" 
with open('input', 'r') as fin, open('output', mode) as fout:
    for index, line in enumerate(fout): N += 1
    for index, line in enumerate(fin):
        if index > N: fout.write(line)
        # i haven't tested this you may need index > N or index >= N
SR3142
  • 550
  • 3
  • 9
  • 1
    This is doing stuff in Python code, though, with quite a bit of work for each line. I doubt this is actually *faster*. Have you measured it? – no comment Oct 01 '21 at 13:53
  • @don'ttalkjustcode I measured it and you are correct speed is very similar to reading the entire file the only difference is this approach uses less system resources but in terms of performance they are the same. – SR3142 Oct 01 '21 at 14:27
  • @SomeoneRandom3142 If my script stops, I have to start over from the first line. Is there any solution for this? – Kheersagar patel Oct 01 '21 at 14:40
  • @Kheersagarpatel i've updated the code to show how you would start from the last line processed, you simply open the previous output file, count the number of lines in that file and continue processing from there. – SR3142 Oct 01 '21 at 15:05
  • @SomeoneRandom3142 But this is not working. I have 20 line so it always save in output file from 7th line. Next time saves from 7th line whereas I want next time from 14th line. – Kheersagar patel Oct 01 '21 at 15:30
  • @Kheersagarpatel what is the initial value of N that you start with, is it 7? can you share a sample of your code – SR3142 Oct 01 '21 at 15:50
  • 50 and how to get first 50 in list – Kheersagar patel Oct 01 '21 at 15:54
  • 1
    Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/237706/discussion-between-someonerandom3142-and-kheersagar-patel). – SR3142 Oct 01 '21 at 16:00