I'm working on code to calculate various thermodynamic properties of a given set of molecules. To do so, I have to plug in 9 coefficients into a set of equations to get the desired values. These coefficients, which vary from molecule to molecule, are retrieved from the NASA Thermobuild database, which has the following format:
C2Cl4 Tetrachloroethylene HF298=-5.034 kcal Burcat G3B3
3 T05/08 C 2.00CL 4.00 0.00 0.00 0.00 0 165.8322000 -21064.348
50.000 200.000 7 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 0.0 19563.551 -5.821898980D+03 4.158580080D+02-7.790140830D+00 1.615966138D-01 -6.791370520D-04
1.598431875D-06-1.556882412D-09 0.000000000D+00-6.205198010D+03 5.774956220D+01
200.000 1000.000 7 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 0.0 19563.551
4.940446670D+04 -1.030763621D+03 1.098508036D+01 1.645945662D-02-2.178412229D-05 1.410593520D-08-3.663931630D-12 0.000000000D+00 -3.353235260D+02-2.878634227D+01 1000.000 6000.000 7 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 0.0 19563.551 -3.067008915D+05-1.128336557D+03 1.681089243D+01-3.159107946D-04 6.850908950D-08 -7.749796920D-12 3.556100470D-16 0.000000000D+00-1.944193938D+03-5.966771040D+01
The specific numbers I need for the calculations are in bold.
(alternatively, in codeblock form so it's a bit neater and closer to the actual arrangement in the database .txt file)
C2Cl4 Tetrachloroethylene HF298=-5.034 kcal Burcat G3B3
3 T05/08 C 2.00CL 4.00 0.00 0.00 0.00 0 165.8322000 -21064.348
50.000 200.000 7 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 0.0 19563.551
-5.821898980D+03 4.158580080D+02-7.790140830D+00 1.615966138D-01-6.791370520D-04
1.598431875D-06-1.556882412D-09 0.000000000D+00-6.205198010D+03 5.774956220D+01
200.000 1000.000 7 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 0.0 19563.551
4.940446670D+04-1.030763621D+03 1.098508036D+01 1.645945662D-02-2.178412229D-05
1.410593520D-08-3.663931630D-12 0.000000000D+00-3.353235260D+02-2.878634227D+01
1000.000 6000.000 7 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 0.0 19563.551
-3.067008915D+05-1.128336557D+03 1.681089243D+01-3.159107946D-04 6.850908950D-08
-7.749796920D-12 3.556100470D-16 0.000000000D+00-1.944193938D+03-5.966771040D+01
The database has hundreds of molecules in it, but I only need the coefficients for about 50 or so, I need a function that will go through, find the molecular species I need from a pre-written list, then pick out each coefficient and return them so I can use them in my calculations (and convert the "D+0%N" to "E+0%N"- I'm not sure why this database uses D's instead of E's to represent scientific notation).
I'm not at all familiar with SQL, so I've just been focusing on basic Python search functions. What I have so far is this:
import pandas as pd
import csv
import math
import numpy as np
species_list=[]
species=pd.read_table('Species list.txt') #list of molecular species I need coefficients for
species_temp=species['Species']
for i in range(len(species_temp)):
species_list.append(species_temp[i])
with open('NEWNASA.TXT','rt') as database: #loads massive coefficient database
for species_name in species_list:
species_name=species_name+" " #to avoid returning ionic forms
for line in database:
if species_name in line:
print line #test to see if it's working
However, a) this stops working after finding the first molecular species, and b) I'm still not sure how to tell the code to find the specific coefficients I need for the calculations. I'm figuring it'll involve regular expressions (which I don't have much experience with, either) and indexing, but that's as far as I've gotten. Any pointers or suggestions would be much appreciated!
Thanks!