I am writing a script to gather results from an output file of a programme. The file contains headers, captions, and data in scientific format. I only want the data and I need a script that can do this repeatedly for different output files with the same results format.
This is the data:
GROUP 1 2 3 4 5 6 7 8
VELOCITY (m/s) 59.4604E+06 55.5297E+06 52.4463E+06 49.3329E+06 45.4639E+06 41.6928E+06 37.7252E+06 34.9447E+06
GROUP 9 10 11 12 13 14 15 16
VELOCITY (m/s) 33.2405E+06 30.8868E+06 27.9475E+06 25.2880E+06 22.8815E+06 21.1951E+06 20.1614E+06 18.7338E+06
GROUP 17 18 19 20 21 22 23 24
VELOCITY (m/s) 16.9510E+06 15.7017E+06 14.9359E+06 14.2075E+06 13.5146E+06 12.8555E+06 11.6805E+06 10.5252E+06
This is my code at the moment. I want it to open the file, search for the keyword 'INPUT:BETA' which indicates the start of the results I want to extract. It then takes the information between this input keyword and the end identifier that signals the end of the data I want. I don't think this section needs changing but I have included it just in case.
I have then tried to use regex to specify the lines that start with VELOCITY (m/s) as these contain the data I need. This works and extracts each line, whitespace and all, into an array. However, I want each numerical value to be a single element, so the next line is supposed to strip the whitespace out and split the lines into individual array elements.
with open(file_name) as f:
t=f.read()
t=t[t.find('INPUT:BETA'):]
t=t[t.find(start_identifier):t.find(end_identifier)]
regex = r"VELOCITY \(m\/s\)\s(.*)"
res = re.findall(regex, t)
res = [s.split() for s in res]
print(res)
print(len(res))
This isn't working, here is the output:
[['33.2405E+06', '30.8868E+06', '27.9475E+06', '25.2880E+06', '22.8815E+06', '21.1951E+06', '20.1614E+06', '18.7338E+06'], ['16.9510E+06', '15.7017E+06', '14.9359E+06', '14.2075E+06', '13.5146E+06', '12.8555E+06', '11.6805E+06', '10.5252E+06']]
2
It's taking out the whitespace but not putting the values into separate elements, which I need for the next stage of the processing.
My question is therefore: How can I extract each value into a separate array element, leaving the rest of the data behind, in a way that will work with different output files with different data?