Trying to write a python script to extract lines from a file. The file is a text file which is a dump of python suds output.
I want to:
- strip all characters except words and numbers. I don't want any "\n", "[", "]", "{", "=", etc characters.
- find a section where it starts with "ArrayOf_xsd_string"
- remove the next line "item[] =" from the result
- grab the remaining 6 lines and create a dictionary based on the unique number on the fifth line (123456, 234567, 345678) using this number as the key and the remaining lines as the values (pardon my ignorance if I'm not explaining this in pythonic terminology)
- output the results to a file
Data in file is a list:
[(ArrayOf_xsd_string){
item[] =
"001",
"ABCD",
"1234",
"wordy type stuff",
"123456",
"more stuff, etc",
}, (ArrayOf_xsd_string){
item[] =
"002",
"ABCD",
"1234",
"wordy type stuff",
"234567",
"more stuff, etc",
}, (ArrayOf_xsd_string){
item[] =
"003",
"ABCD",
"1234",
"wordy type stuff",
"345678",
"more stuff, etc",
}]
I tried doing a re.compile and here is my poor attempt at the code:
import re, string
f = open('data.txt', 'rb')
linelist = []
for line in f:
line = re.compile('[\W_]+')
line.sub('', string.printable)
linelist.append(line)
print linelist
newlines = []
for line in linelist:
mylines = line.split()
if re.search(r'\w+', 'ArrayOf_xsd_string'):
newlines.append([next(linelist) for _ in range(6)])
print newlines
I'm a Python newbie and haven't found any results in google or on stackoverflow for how to extract specific number of lines after finding specific text. Any help is most appreciated.
Please ignore my code as I am taking "shots in the dark" :)
Here is what I'd like to see as the results:
123456: 001,ABCD,1234,wordy type stuff,more stuff etc
234567: 002,ABCD,1234,wordy type stuff,more stuff etc
345678: 003,ABCD,1234,wordy type stuff,more stuff etc
I hope that helps with trying to interpret my flawed code.