0

I have a series of text files with formatting as follows

Transaction Summary
Joe buys from Mindy 5 apples for 6$
Mark buys from Alex 3 apples for 5$
...
END

Where there can be a variable amount of apple transactions--one text file might have 2 others might have 6--but the files are all formatted the same. I want to essentially store the lines between Transaction Summary and End.

I first consulted this method which allowed me to print said lines, but I couldn't figure out how to store the lines.

Instead I decided to just read the entire text file and store then and then trim the data I need

with open(filename) as f:
    data = f.readlines()
f.close

This way I could splice this list of strings. The issue I'm having is that while I know where to start the splice (the 1 row index), since each text file has a variable amount of transactions, I don't know how to choose the specific index that has the "END" string in it.

Any input would be appreciated--thanks!

NSHAH
  • 61
  • 2

2 Answers2

0

data.txt

Transaction Summary
Joe buys from Mindy 5 apples for 6$
Mark buys from Alex 3 apples for 5$
END

code

with open('data.txt') as file:
    lines = file.readlines()

transaction = []
for line in lines[1:-1]:
    tokens = line.split(' ')
    transaction.append((
        tokens[0], 
        tokens[3], 
        int(tokens[4]),
        int(tokens[7].rstrip('$\n')) ))

print(transaction)

result

[('Joe', 'Mindy', 5, 6), ('Mark', 'Alex', 3, 5)]
William Lee
  • 337
  • 4
  • 11
  • Thanks--my original plan was to utilize ntlk tokenize! Question--if my text file doesn't actually END at the END line--let's say there is extraneous lines after it that I don't care about, how do I set the the line index (-1) to stop specifically at a variable yet known string spot (lets say when it reads the string END)? – NSHAH Oct 24 '18 at 13:14
  • You will have to explicitly find the line with END instead – William Lee Oct 25 '18 at 00:50
  • Thanks--I used a enumerate() to find where to stop and store that value as the index! – NSHAH Oct 25 '18 at 04:43
0

You can try to use regex.

import re

string = """ 
Transaction Summary
Joe buys from Mindy 5 apples for 6$
Mark buys from Alex 3 apples for 5$
END
"""
print(re.findall(r"(\w+) buys from (\w+) (\d+) apples for (\d+)",string))
# [('Joe', 'Mindy', '5', '6'), ('Mark', 'Alex', '3', '5')]
KC.
  • 2,981
  • 2
  • 12
  • 22