Best way to parse a file Python

Question

I have a text file that I need to read, identify some parts to change, and write to a new file. Here's a snippet of what the text file (which is about 600 lines long) would look similar to:

<REAPER_PROJECT 0.1 "4.731/x64" 1431724762
  RIPPLE 0
  RECORD_PATH "Audio" ""
  <RECORD_CFG
    ZXZhdxgA
  >
  <APPLYFX_CFG
  >
  LOCK 1
  <METRONOME 6 2
    VOL 0.25 0.125
    FREQ 800 1600 1
    BEATLEN 4
    SAMPLES "" ""
  >
 >

So, for example, I'd need to change "LOCK 1" to "LOCK 0". Right now I'm reading the file line by line, looking for when I hit the "LOCK" keyword and then instead of writing "LOCK 1", I write "LOCK 0" (all other lines are written as is). Pretty straightforward.

Part of this seems kinda messy to me, though, as sometimes when I have to use nested for loops to parse a sub-section of the text file I run into weirdness dealing with the file pointer off-by-one errors - not a biggie and manageable, but I was kinda looking for some opinions on this. Instead, I was wondering if it would make more sense to read the entire file into a list, parse through the list, looking for keywords to change, updating those specific lines in the list, and then writing the whole list to the new file. It seems like I would have a bit more control over things as I wouldn't have to process the file in a linear fashion which I'm kinda forced to do now.

So, I guess the last sentence kinda justified why it could be advantageous to pull it all into a list, process the list, and then write it out. I'm kinda curious how others with more programming experience (as mine is somewhat limited) would tackle this kind of issue. Any other ways that would prove even more efficient?

Btw, I didn't generate this file - other software did, and I don't have any communication with the developer so I have no way of knowing what they're using to read/write the file. I'd absolutely love it if I had a neat reader that could read the file and populate it into variables and then rewrite it out, but for me to code something that would do that would be overkill for what I'm trying to accomplish.

I'm kinda tempted to rewrite my script to read it into a list as it seems like it would be a better way to go, but I thought I'd ask people what they thought before I did. My version works, but I don't mind going through the motions, either, as it's a good lesson regardless. I figured this could also be a case where there are always different ways to tackle a problem, but I'd like to try and be as efficient as possible.

UPDATE

So, I probably should have mentioned this, but I was still trying to figure out what to ask - while I need to find certain elements and change them, I can only find those elements by finding their header (i.e. "ITEM") and then replacing the element within the block. So it'll be something like this:

<METRONOME
  NAME Clicky
  SPEED fast
>
<ITEM
  LOOP 0
  NAME Mike
  FILE something.wav
  ..
>
<ITEM
  LOOP 1
  NAME Joe
  FILE anotherfile.wav
  ..
>

So the only way to identify the correct block of data is to first find the ITEM header, then keep reading until I find the NAME element, and then update the file name for that whole ITEM block. There are other elements within that block that I need to update, and the name header isn't the first item. Also, I can't assume that the name element also exists just in ITEM blocks.

So maybe this really has less to do with reading it into memory and more of how to properly parse this type of file? Or are there some benefits to reading it into memory and being easier to manipulate? Sorry I didn't clarify that in the original question...

Does it need to be a real-time solution? Or does it only need to happen once? — Coolq B, May 14 '17 at 07:59
It's a script I run to build a radio show, so it happens every week. So, in a way, no, it's not a real-time solution. I'm looking for ease-of-use over speed, if that's what you were referring to.. — Mike, May 14 '17 at 14:42
I added an update to the question to clarify a little bit further.. — Mike, May 14 '17 at 15:01
I think you misunderstood my question as I'm trying to figure out if I should read the file into memory first and parse that list, or if I should just read the file line by line and make updates as I go. I bolded the question above - sorry it wasn't more clear... — Mike, May 14 '17 at 22:40
Oop! Sorry! How long is each line? Based off your example text, 600 lines of that would easily fit into RAM, however it also depends on how much RAM you have. All that would easily fit into 50MB or so. At least that's what I would assume — Coolq B, May 14 '17 at 22:58
There are some long lines that would be around 200 characters, but for the most part the above is representative. That's a good point about memory - I hadn't thought about that! It's interesting to weigh the different reasons why you'd go one way or another (read file directly or read it into memory). Thanks! — Mike, May 14 '17 at 23:51
Just remember, RAM basically is HDD/SDD, just a lot faster, with the fact that when you turn your computer off, it loses power and deletes all the information. Other than that, they achieve the same thing. Storage. If you find out how large the file is(in size MB, KB, GB). That's approx how much ram it will take. — Coolq B, May 14 '17 at 23:58
If you are looking for a proper parsing solution. You should look at the pyparsing library. It might be slight overkill for what you are trying to do here, but if you have an ongoing need to deal with this kind of data it may be justified. — Paul Rooney, May 16 '17 at 01:00

score 0 · Answer 1 · answered May 14 '17 at 08:01

0

If it has only ~600 lines, you can take it into memory

replace = [('LOCK 1', 'LOCK 0'), (), ()....]
with open('read.txt') as r:
    read = r.read()
    for i in replace:
        read.replace(*i)
    with open('write.txt', 'w') as w:
        w.write(read)

answered May 14 '17 at 08:01

itzMEonTV

19,851
4
39
49

This is a good answer, however it will match things such as `LOCK 12`. Also did you mean for line 5 to actually be `read = read.replace(*i)`? – Coolq B May 14 '17 at 08:21
This is kinda similar to how I do it now (I have slightly more complex conditional statements and cleanup to make sure I'm matching the right things), except I wasn't using the list. But I like the possibility that I can identify certain elements by list element and access them that way out of order, but I guess I could do that in a file if I just kept track of what line I'm on and used f.seek(). It seems like either way I need to move through the file or list regardless. Should I just keep track of what line I'm on and seek around the file? – Mike May 14 '17 at 14:51

score 0 · Answer 2 · edited May 23 '17 at 12:03

Here's my answer using regex:

import re

text = """<REAPER_PROJECT 0.1 "4.731/x64" 1431724762
  RIPPLE 0
  RECORD_PATH "Audio" ""
  <RECORD_CFG
    ZXZhdxgA
  >
  <APPLYFX_CFG
  >
  LOCK 1
  <METRONOME 6 2
    VOL 0.25 0.125
    FREQ 800 1600 1
    BEATLEN 4
    SAMPLES "" ""
  >
 >
"""

print(re.sub("LOCK 1\D", "LOCK 0" + "\n", text))

If you're interested in writing the file to disk.

with open("written.txt", 'w') as f:
    f.write(re.sub("LOCK 1\D", "LOCK 0" + "\n", text))

EDIT

You said that you wanted it to be more flexible? Okay, I tried to make an example, however for that I would need more information about your setup..etc. So instead, I'll point you to a resource that could help you. This will also be good, if you ever want to change or add anything, now you'll understand what to do.

https://www.youtube.com/watch?v=DRR9fOXkfRE # How regex works for python in general.
https://regexone.com/references/python # Some information about regex and python.
https://stackoverflow.com/a/5658439/4837005 # An example of using regex to replace a string.

I hope this helps.

So, I guess I should have qualified my statement a little more, I need to find elements such as "metronome", and then process the statement block that comes after it, so it's not just about replacing one line - Sorry I didn't clarify that in the question... — Mike, May 14 '17 at 14:53
Ok, thanks for saying, my answer will need changing, however regex should be good enough, I'll see what I can do — Coolq B, May 14 '17 at 21:52

Best way to parse a file Python

UPDATE

2 Answers2