0

I am trying to find edit a file using python. I need to add contents after specific lines. Following is my code:

with open('test1.spd', 'r+') as f:
    file = f.readlines()
    for line in file:
        if '.DisplayUnit = du' in line:
            pos = line.index('.DisplayUnit = du')
            file.insert(pos + 1, '.DisplayUnit = nl')
    f.seek(0)
    f.writelines(file)
f.close()

The file has about 150K+ lines. The above code is taking forever to edit it. Any help to improve the performance? I am quite new to python.

  • You need to read all the file to find the `\n` characters so you can split it by newline (that's what `readlines` does) and then find the line you are looking for, insert the thing you want and write it back. As the file gets larger this will get slower. Perhaps it's time to get a database involved rather than using a file in this way – apokryfos Apr 14 '22 at 04:25
  • You need `f.truncate()` after you do `f.writelines(file)`, in case the updated file is shorter than that original. You don't need `f.close()`, that's done automatically by `with`. – Barmar Apr 14 '22 at 04:27
  • `pos` is an index in the `line` string. Why are you using that as an insertion index in the `file` list? They don't seem related at all. – Barmar Apr 14 '22 at 04:29
  • Your program is "slow" (in fact it is never ending) because there are instances where you are modifying the list that is being looped with "for" - there are situations where you are inserting elements that will trigger the condition and thus extending the list and results in the same element being matched to extend the list, again. – metatoaster Apr 14 '22 at 04:32
  • @Barmar - I want to append a few lines to the file after a specific line. I was trying to find the position of the line and append after that. The code just takes forever. –  Apr 14 '22 at 04:33
  • If you want the index of the current line, use `for pos, line in enumerate(file):`. But as @metatoaster points out, inserting into the list that you're looping over causes problems. You can resolve this by iterating backwards. – Barmar Apr 14 '22 at 04:35
  • Another way to fix it is to append to a new list. – Barmar Apr 14 '22 at 04:36
  • @PJJ Iterating over 150k lines should be almost instant in any typical programming language, the "code just takes forever" because it's repeating calling `file.insert` inside the `for` loop thus extending the list just one further element for every iteration, so your for loop now has one additional element to iterate for every iteration, causing the for loop to never have an exit condition (until your system runs out of memory, because it keeps adding new elements to the list, forever) - see [this question](https://stackoverflow.com/q/5931223/) because this is effectively what you have done. – metatoaster Apr 14 '22 at 04:39
  • For the inverse problem, modifying the list the other direction will cause elements to be skipped (so no, going backwards in fact not fix the issue and may introduce a different bug this way) - see [thread](https://stackoverflow.com/questions/43796462/python-for-loop-is-not-looping-through-all-items). – metatoaster Apr 14 '22 at 04:44
  • You don' need `f.close()` as you are using `with open` which will automatically close. Also, you have to use f.seek(0) to go back to start of the file again. – user3327034 Apr 14 '22 at 04:45

1 Answers1

0

You're causing ain infinite loop by inserting into the list that you're looping over.

Instead, add the lines to a new list.

with open('test1.spd', 'r+') as f:
    newfile = []
    for line in f:
        newfile.append(line)
        if '.DisplayUnit = du' in line:
            newfile.append('.DisplayUnit = nl\n')
    f.seek(0)
    f.writelines(newfile)
    f.truncate()

Note that you need to include the \n at the end of the line, f.writelines() doesn't add them itself.

Since you're not inserting into the list of lines in this version, there's no need to use readlines(), just iterate over the file to read one line at a time.

Barmar
  • 741,623
  • 53
  • 500
  • 612