-1

I'm trying to build a python script to calculate averages. To do so, I need to build a vector with 10 columns. Each input comes from 10 different text files with many lines, and I need a specific one that looks like this:

"BAR: dG =   -23.98 kcal/mol"

Each file has a different number for this line. How can I get only the number after the string "BAR: dG = " from these text files and use as an input for a vector like this:

yi = ["number from file 1", "number from file 2" , ... , "number from file 10"]

Toto
  • 89,455
  • 62
  • 89
  • 125

1 Answers1

1

Here's a stab at it using regular expressions.

import re

lines = ["BAR: dG =   -23.98 kcal/mol",
         "BAR: dG =   23.98 kcal/mol",
         "BAR: dG =   +3.98 kcal/mol",
         "BAR: dG =   10 kcal/mol"
         "BAR: dG =   .1 kcal/mol",  # this will not find,
         ]


numbers = []
for line in lines:
    finds = re.findall(r'([-+]?\d+\.?\d*)', line)
    # [+-]? means find 0 or 1 of - or +
    # \d+ means one or more digit (.1 REQUIRES a number before it. Could change it to \d* but you may pick up unwanted stuff if you find another "." somewhere
    # \.? means 0 or 1 decimal point
    # \d* means some numbers after
    if finds:
        try:
            numbers.append(float(finds[0]))  # only one number per line or multiple?
        except:
            print('Regex did not work as expected, it extracted')
            print(finds)
    else:
        print('No number found on line:')
        print(line)

EDIT

I think I misread. Is this what you want?

line = "BAR: dG =   -23.98 kcal/mol"
key = "BAR: dG ="
n_start = line.find(key)
if n_start > -1:
    rest_of_line = line[n_start+len(key):]
    number = float(rest_of_line.strip().split()[0])
    # strip removes lead and end spaces
    # split separated it at each space

EDIT 2

Try this. You gotta do some work ;)

# ex:     
line = "BAR: dG =   -23.98 kcal/mol"
def find_dg(line):
    key = "BAR: dG ="
    n_start = line.find(key)
    number = None
    if n_start > -1:
        rest_of_line = line[n_start+len(key):]
        number = float(rest_of_line.strip().split()[0])
    return number
        # strip removes lead and end spaces
        # split separated it at each space

import glob
for path in glob.iglob('*.txt'):
    # quick google search: https://stackoverflow.com/questions/3277503/how-to-read-a-file-line-by-line-into-a-list
    numbers = []
    with open(path) as file:
        lines = file.readlines()
        number_to_add = None
        for line in lines:
            number_to_add = find_dg(line)
            if number_to_add is not None:
                break # exit this for loop if we find a number
        numbers.append(number_to_add)
likethevegetable
  • 264
  • 1
  • 4
  • 17
  • I'm a beginner user of python. Can you tell me how can I read a file, say data.txt, in your example? – Thales Souza Freire Aug 24 '21 at 22:51
  • You should post a different question or try to be more specific. You can adapt code from here: https://stackoverflow.com/questions/3277503/how-to-read-a-file-line-by-line-into-a-list You can get a list of files following this: https://stackoverflow.com/questions/33747968/getting-file-list-using-glob-in-python – likethevegetable Aug 24 '21 at 22:53
  • I added some explanation on what is the end result. Let me know if it is better now or worst. – Thales Souza Freire Aug 25 '21 at 12:20
  • The problem is that your question is asking: "How can I get only the -23.89 after the string "BAR: dG = " from this text file and use as a input for a vector like this:", I've shown you how to do that. But you also want to know how to read the file. See edit 2. Try my code and see waht it does. – likethevegetable Aug 25 '21 at 14:12