1

I am writing a script to gather results from an output file of a programme. The file contains headers, captions, and data in scientific format. I only want the data and I need a script that can do this repeatedly for different output files with the same results format.

This is the data:

    GROUP              1             2             3             4             5             6             7             8
VELOCITY (m/s)    59.4604E+06   55.5297E+06   52.4463E+06   49.3329E+06   45.4639E+06   41.6928E+06   37.7252E+06   34.9447E+06

GROUP              9            10            11            12            13            14            15            16
VELOCITY (m/s)    33.2405E+06   30.8868E+06   27.9475E+06   25.2880E+06   22.8815E+06   21.1951E+06   20.1614E+06   18.7338E+06

GROUP             17            18            19            20            21            22            23            24
VELOCITY (m/s)    16.9510E+06   15.7017E+06   14.9359E+06   14.2075E+06   13.5146E+06   12.8555E+06   11.6805E+06   10.5252E+06

This is my code at the moment. I want it to open the file, search for the keyword 'INPUT:BETA' which indicates the start of the results I want to extract. It then takes the information between this input keyword and the end identifier that signals the end of the data I want. I don't think this section needs changing but I have included it just in case.

I have then tried to use regex to specify the lines that start with VELOCITY (m/s) as these contain the data I need. This works and extracts each line, whitespace and all, into an array. However, I want each numerical value to be a single element, so the next line is supposed to strip the whitespace out and split the lines into individual array elements.

with open(file_name) as f:
        t=f.read()
        t=t[t.find('INPUT:BETA'):]
        t=t[t.find(start_identifier):t.find(end_identifier)]
        regex = r"VELOCITY \(m\/s\)\s(.*)"
        res = re.findall(regex, t)
        res = [s.split() for s in res]
        print(res)
        print(len(res))

This isn't working, here is the output:

[['33.2405E+06', '30.8868E+06', '27.9475E+06', '25.2880E+06', '22.8815E+06', '21.1951E+06', '20.1614E+06', '18.7338E+06'], ['16.9510E+06', '15.7017E+06', '14.9359E+06', '14.2075E+06', '13.5146E+06', '12.8555E+06', '11.6805E+06', '10.5252E+06']]
2

It's taking out the whitespace but not putting the values into separate elements, which I need for the next stage of the processing.

My question is therefore: How can I extract each value into a separate array element, leaving the rest of the data behind, in a way that will work with different output files with different data?

MUD
  • 121
  • 1
  • 13
  • 1
    What do you mean by 'separate elements'? – mapf Dec 22 '20 at 15:12
  • I mean I need each value (e.g. 33.2405E+06) in its own element of the array. So the total size should be 16 (or 24 if the first line is included), not 2 as it is currently. – MUD Dec 22 '20 at 15:16
  • 1
    So you want to [flatten a list of lists](https://stackoverflow.com/questions/952914/how-to-make-a-flat-list-out-of-list-of-lists). –  Dec 22 '20 at 15:22
  • Yes, that did it thank you! – MUD Dec 22 '20 at 15:45

1 Answers1

2

Here is how you can flatten your list, which is your point 1.

import re

text = """
 GROUP              1             2             3             4             5             6             7             8
VELOCITY (m/s)    59.4604E+06   55.5297E+06   52.4463E+06   49.3329E+06   45.4639E+06   41.6928E+06   37.7252E+06   34.9447E+06

GROUP              9            10            11            12            13            14            15            16
VELOCITY (m/s)    33.2405E+06   30.8868E+06   27.9475E+06   25.2880E+06   22.8815E+06   21.1951E+06   20.1614E+06   18.7338E+06

GROUP             17            18            19            20            21            22            23            24
VELOCITY (m/s)    16.9510E+06   15.7017E+06   14.9359E+06   14.2075E+06   13.5146E+06   12.8555E+06   11.6805E+06   10.5252E+06
"""

regex = r"VELOCITY \(m\/s\)\s(.*)"
res = re.findall(regex, text)
res = [s.split() for s in res]
res = [value for lst in res for value in lst]
print(res)
print(len(res))

Your regex isn't skipping your first line though. There must be an error in the rest of your code.

mapf
  • 1,906
  • 1
  • 14
  • 40
  • This is great, thank you! And yes, I just found the error in the rest of the code - will update the question now. – MUD Dec 22 '20 at 15:46