Python read from specific parts of a text file

Question

I have a text file that looks like this:

1
Subpop 0 best fitness of run: Fitness: Standardized=73.0 Adjusted=0.013513513513513514 Hits=16
2
Subpop 0 best fitness of run: Fitness: Standardized=61.0 Adjusted=0.016129032258064516 Hits=28
3
Subpop 0 best fitness of run: Fitness: Standardized=73.0 Adjusted=0.013513513513513514 Hits=16
4
Subpop 0 best fitness of run: Fitness: Standardized=70.0 Adjusted=0.014084507042253521 Hits=19
5
Subpop 0 best fitness of run: Fitness: Standardized=72.0 Adjusted=0.0136986301369863 Hits=17
6
Subpop 0 best fitness of run: Fitness: Standardized=67.0 Adjusted=0.014705882352941176 Hits=22
7
Subpop 0 best fitness of run: Fitness: Standardized=65.0 Adjusted=0.015151515151515152 Hits=24
8
Subpop 0 best fitness of run: Fitness: Standardized=73.0 Adjusted=0.013513513513513514 Hits=16
9
Subpop 0 best fitness of run: Fitness: Standardized=78.0 Adjusted=0.012658227848101266 Hits=11
10
Subpop 0 best fitness of run: Fitness: Standardized=65.0 Adjusted=0.015151515151515152 Hits=24

I am trying to use Python to extract the number from the "Standardized" and "Hits" sections from each line and put these in their own separate lists but I am unfamiliar with reading from files in Python. What would be the best way to do this?

what have you tried so far? perhaps start here:http://stackoverflow.com/questions/19508703/how-to-open-a-file-through-python — jprockbelly, Nov 25 '16 at 03:05
Currently I can open the file and I have tried putting each lines contents into a list but from here things get reasonably complicated and I think a more elegant solution must exist. — theguyty, Nov 25 '16 at 03:13

score 2 · Accepted Answer · answered Nov 25 '16 at 03:20

We do not usually write code for people, but this looks it might not to be homework. I also want to state an important point.

A file is an iterable of newline-terminated strings. A list of newline-terminated strings is also such an iterable. So start with that for development, and switch to an opened file later, when the code works of the in-code list. Not doing this is, in my opinion, a big mistake and source of problems.

Next, iterate and toss 'junk' lines. Then parse payoff lines and do whatever processing of the extracted data. Parsing depends on the problem. I choose below to use splitlines and split methods.

file = '''\
1
Subpop 0 best fitness of run: Fitness: Standardized=73.0 Adjusted=0.013513513513513514 Hits=16
2
Subpop 0 best fitness of run: Fitness: Standardized=61.0 Adjusted=0.016129032258064516 Hits=28
3
Subpop 0 best fitness of run: Fitness: Standardized=73.0 Adjusted=0.013513513513513514 Hits=16
4
Subpop 0 best fitness of run: Fitness: Standardized=70.0 Adjusted=0.014084507042253521 Hits=19
5
Subpop 0 best fitness of run: Fitness: Standardized=72.0 Adjusted=0.0136986301369863 Hits=17
'''.splitlines(keepends=True)

stand = []
hits = []

for line in file:
    if len(line) < 50:
        continue
    fields = line.split('=')
    stand.append(float(fields[1].split()[0]))
    hits.append(int(fields[3].split()[0]))

print(stand)
print(hits)
# prints
# [73.0, 61.0, 73.0, 70.0, 72.0]
# [16, 28, 16, 19, 17]

@MohammadYusufGhazi to skip the short lines, like the first: '1\n'. Try removing the check and see what happens. — Terry Jan Reedy, Nov 25 '16 at 03:27
@MohammadYusufGhazi 1 would have been enough for the sample data but I assumed that serial numbers increase to 10, 100, and possibly higher. 10 would probaby be enough. 50 seemed sure to be safe, as in "would work with likely real datasets", which is the ultimate proof of correctness. — Terry Jan Reedy, Nov 25 '16 at 09:44

Python read from specific parts of a text file

1 Answers1