Import numerical data from txt file with description and column headers before data with Python

Question

I want to plot some data in a polar plot and the code I wrote works great if I import the data from a simple txt file with any headers or text in it. The problem is that I get the data from an automatic device in a format as follow:

Hello word      
Hi Again        
Angle [deg] Level of radiation
-180    -1
-175    -8.17
-170    -15
-165    -13.67

At the moment, I import the data with the following code, but if there is text or headers doesn't work:

for line in open(Data.txt, 'r'):
  values = [float(s) for s in line.split()]
  Position.append(values[0])
  Level.append(values[1])
  NormalizedLevel.append(values[2])

My goal is to have the first two rows stored as a text to be displayed somewhere in the plot, and then the following three columns stored in three different arrays. If possible, the name of each array should be the header of the column, but if it is not possible it is not a big issue!

Any ideas? Thanks in advance!

Is it always going to be exactly two preceding text lines? If so, you could just store the enumerable (an array I guess?) that is returned by the `open` call into a variable called `lines` for instance, use the first two lines as you need (via `lines[0]` and `lines[1]`) and update your `for` loop to be `for line in lines[2:]` (using array slicing syntax to get 3rd line and beyond). — benmccallum, May 10 '19 at 14:01
Good point! I checked some historical data and sometimes there are three or four text lines before the column headers. Is there the possibility of using an if condition nested in a for loop that analyse the first value of the first column and so on until it found a value instead of a text string? — Fabulm, May 10 '19 at 17:12
Yep, you could use my method above but you'd have to figure out the number of rows to skip with a loop like you mention first. The check would be a check of the value being a number. — benmccallum, May 11 '19 at 20:42

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

1

Skip the headers. If you know how many lines you have in the headers, you can do:

with open('Data.txt', 'r') as ff:
    lines = ff.readlines()
    for ll in lines[3:] #if you have 3 lines of header as in your example
        values = [float(s) for s in ll.split()]
        Position.append(values[0])
        Level.append(values[1])
        NormalizedLevel.append(values[2])

You can grab the header from (in this case) lines[:3] to later use. It's a list.

EDIT

In case you do not know the number of header's lines, you could use the following code:

header = []
with open('data.txt', 'r') as ff:
    lines = ff.readlines()

    for ll in lines:
        try:
            values = [float(s) for s in ll.split()]
            Position.append(values[0])
            Level.append(values[1])
            NormalizedLevel.append(values[2])
        except ValueError:
            header.append(ll)

Basically you try to split the string and to convert the entries of the list to floats. If something goes wrong, a ValueError exception is raised. The code assumes that line it's a header line and store it in a proper list.
Since you are dealing with a variable format, I think it's the best you can do.

edited Jun 20 '20 at 09:12

Community

1
1

answered May 10 '19 at 13:59

Valentino

7,291
6
18
34

Thanks, it works great if there are always three headers. I'm try to figure out how to do it if the number of headers is variable. – Fabulm May 13 '19 at 14:09
To my knowledge, exceptions, at least in most runtimes, aren't great for perf. This'll do it, but it'd be best to avoid the exception I'd say by utilising a check for `isdigit` – benmccallum May 14 '19 at 14:29
@benmccallum If we are speaking of performace, it depends on how many times the exception is raised. In this case, you can expect that most of the lines will be data and only few lines headers (exception raised). So should be fine. See [here](https://stackoverflow.com/questions/2522005/cost-of-exception-handlers-in-python) or [here](https://stackoverflow.com/questions/1835756/using-try-vs-if-in-python) for a deeper discussion. You already proposed a good solution using `isdigit`, so it's pointless that I edit my question to do the same. – Valentino May 14 '19 at 15:19

benmccallum · Answer 2 · 2019-05-13T14:37:14.313

With a variable number of header lines, you'd need to first calculate numberOfHeaderLines you have. Once you know that, then you could use array slicing to pull off the rest of the data.

with open('Data.txt', 'r') as file:
    lines = file.readlines()

    numberOfHeaderLines = 0
    for line in lines
        values = lines.split()

        # perhaps store your header data somewhere

        if (values[0].isdigit())
            break; # exit this loop now we know we're at a data row
        numberOfHeaderLines++ # increment

    for line in lines[numberOfHeaderLines:] 
        values = [float(s) for s in lines.split()]
        Position.append(values[0])
        Level.append(values[1])
        NormalizedLevel.append(values[2])

There's probably more concise ways to do this, but (1) I'm not a Python guy, and (2) if you're new to programming it's important to know the fundamental approaches like this, which are essentially language independent.

That's exactly what I thought, but it doesn't work. I think it's because with```values = [float(s) for s in lines.split()]``` is imposed that the text is a number (float). — Fabulm, May 13 '19 at 14:07
Ah yes, that split will need to return an array of strings. Updated. — benmccallum, May 13 '19 at 14:36

Import numerical data from txt file with description and column headers before data with Python

2 Answers2

EDIT