Python: how to split positional data from textfile

Question

I have a textfile with data which I am trying to read in Python:

OMEGA2    1.450E+00 1.500E+00 1.550E+00 1.600E+00 1.650E+00 1.700E+00
OMEGA2    1.800E+00 1.850E+00 1.900E+00 1.950E+00 2.000E+00 2.050E+00
F2REAL    1.146E+00 -1.015E+03-2.206E+03-2.618E+03-2.288E+03-1.400E+03
F2REAL    6.255E+00 -3.254E+02-8.150E+02-1.060E+03-9.749E+02-5.995E+02
F2REAL    1.754E+01 -1.530E+02-4.375E+02-5.932E+02-5.618E+02-3.536E+02
F2REAL    1.740E+01 -7.981E+01-2.525E+02-3.748E+02-3.891E+02-2.739E+02
OMEGA2    1.800E+00 1.850E+00 1.900E+00 1.950E+00 2.000E+00 2.050E+00

Now, I only want to have values where the line start with F2REAL; Per line, I want to extract 6 values. Value1 is from index 11 to index 20, value to from index 21 to 30, ..., value 6 is from index 61:70

I tried the following:

file = 'file.txt'
STR1 = 'F2REAL'

def get_data():
    with open(file) as f:
        hyd_all = f.readlines()
        for line in hyd_all:
            if STR1 in line:
                print([float(line[10:19]),float(line[20:29])])

get_data()

This does not read the E-power, as I get [1.146,-1.015,..]. How do I get it correctly?
Is there a better way instead of writing 10:19,20:29,..60:69 ? All lines of interest have 6 columns and always start at 10*i
I want to append each result to a matrix. In this example of 4 rows and 6 columns

A slice like `line[10:19]` includes the first index, but not the last (i.e., line[19]) is not included in the slice. Use `line[10:20]`, etc.. — RootTwo, Feb 13 '21 at 09:19

Stefan B · Accepted Answer · 2021-02-13T09:22:54.737

e-notation is just that - a notation. The values are parsed correctly, just represented differently
you could use a list-comprehension
Assuming you are talking about a numpy-matrix (otherwise just switch to pandas DataFrame):

import numpy as np


def get_data(path: str, target: str, width: int = 10):
    values = []
    with open(path, 'r') as f:
        for line in f.readlines():
            # 'F2REAL' should be at the beginning of the line not just anywhere
            if line.startswith(target):
                # map sequential fixed widths to float
                values.append([float(line[width*i:width*(i+1)]) for i in range(1, 7)])

    return np.asarray(values)
    

print(get_data('file.txt', 'F2REAL'))

output:

[[ 1.146e+00 -1.015e+03 -2.206e+03 -2.618e+03 -2.288e+03 -1.400e+03]
 [ 6.255e+00 -3.254e+02 -8.150e+02 -1.060e+03 -9.749e+02 -5.995e+02]
 [ 1.754e+01 -1.530e+02 -4.375e+02 -5.932e+02 -5.618e+02 -3.536e+02]
 [ 1.740e+01 -7.981e+01 -2.525e+02 -3.748e+02 -3.891e+02 -2.739e+02]]

score 0 · Answer 2 · answered Feb 13 '21 at 08:55

file = 'file.txt'
STR1 = 'F2REAL'

def get_data():
    with open(file) as f:
        hyd_all = f.readlines()
        for line in hyd_all:
            if STR1 in line:
                print(line[10:20],line[20:30],line[30:40],line[40:50],line[50:60],line[60:70])

get_data()

Results would be like following:

Enrico Gandini · Answer 3 · 2021-02-13T09:13:08.950

Your file falls in the category of fixed-width formatted files. I suggest you use Pandas library, which has a specific function to read this kind of files, read_fwf. Function read_fwf takes a colspecs argument, which is a list of tuples, each tuple containing the start and end of a particular column. Since your file does not have an header column, you should use header=None, and your columns will be automatically assigned numbers (which you can then change to proper names, or add an header column to your file).

This function recognizes the scientific notation (E+0...) and parses your numbers as actual numbers, not strings. You can then change the display format of the numbers to whatever you like.

import pandas as pd


colspecs = [(0, 6), (10, 19), (20, 30), (30, 40), (40, 50), (50, 60), (60, 70)]
df = pd.read_fwf("file.txt", header=None, colspecs=colspecs)

If possible, I suggest you use Pandas: it is a very powerful library, and you can perform a lot of operations on your data, such as plotting, queries, or calculation of statistics. The code is also very concise.

Thanks for the reply. I am familiar with Pandas, but I don't want to use it here, since I have a raw text file with a lot of other lines of different sizes, comments and more — Jeroen, Feb 13 '21 at 09:12

score 0 · Answer 4 · answered Feb 13 '21 at 09:08

0

def parse_scientific(s):
    root = float(s.split('E')[0])
    exp  = int(s.split('E')[1])
    return root*(10**exp)

def get_data():
    with open(file) as f:
        hyd_all = f.readlines()
        for line in hyd_all:
            if line.startswith(STR1):
                item_values = [parse_scientific(line[offset*10:offset*10+10]) for offset in range(1,7)]

Use item_values to insert into your matrix

answered Feb 13 '21 at 09:08

Chance

440
2
6

This did not work for me: AttributeError: 'generator' object has no attribute 'split' – Jeroen Feb 13 '21 at 09:19

Python: how to split positional data from textfile

4 Answers4