I have over 200 files . for example one of them is like below they are txt files. I want to read them one by one and then take specific information from them and export it to a xls file
As an example, how can I get the following information in a xls file
TOTAL ENERGY = -444.38126 EV
ELECTRONIC ENERGY = -840.31531 EV
CORE-CORE REPULSION = 395.93406 EV
GRADIENT NORM = 0.91931 = 0.45965 PER ATOM
DIPOLE = 2.66600 DEBYE POINT GROUP: C2v
NO. OF FILLED LEVELS = 6
IONIZATION POTENTIAL = 10.352991 EV
HOMO LUMO ENERGIES (EV) = -10.353 0.402
MOLECULAR WEIGHT = 30.0262
COSMO AREA = 60.70 SQUARE ANGSTROMS
COSMO VOLUME = 42.52 CUBIC ANGSTROMS
I read few posts and they wrote that one can use
sed -n ".." file.txt
The problem is that even if I am going to use that it will take me so long because i should read one file at the time into bash then I should go for each keywords like
HEAT OF FORMATION
TOTAL ENERGY
ELECTRONIC ENERGY
CORE-CORE REPULSION
GRADIENT NORM
DIPOLE
NO. OF FILLED LEVELS
IONIZATION POTENTIAL
HOMO LUMO ENERGIES (EV)
MOLECULAR WEIGHT
COSMO AREA
COSMO VOLUME
Then I paste one by one the line to a xls file with their coresponding line information
SUMMARY OF PM7 CALCULATION, Site No: 29451
MOPAC2016 (Version: 18.063M)
Tue Mar 20 15:08:13 2018
No. of days remaining = 349
Empirical Formula: C H2 O = 4 atoms
SYMMETRY
Formaldehyde
GEOMETRY OPTIMISED USING EIGENVECTOR FOLLOWING (EF).
SCF FIELD WAS ACHIEVED
HEAT OF FORMATION = -25.54241 KCAL/MOL = -106.86944 KJ/MOL
TOTAL ENERGY = -444.38126 EV
ELECTRONIC ENERGY = -840.31531 EV
CORE-CORE REPULSION = 395.93406 EV
GRADIENT NORM = 0.91931 = 0.45965 PER ATOM
DIPOLE = 2.66600 DEBYE POINT GROUP: C2v
NO. OF FILLED LEVELS = 6
IONIZATION POTENTIAL = 10.352991 EV
HOMO LUMO ENERGIES (EV) = -10.353 0.402
MOLECULAR WEIGHT = 30.0262
COSMO AREA = 60.70 SQUARE ANGSTROMS
COSMO VOLUME = 42.52 CUBIC ANGSTROMS
MOLECULAR DIMENSIONS (Angstroms)
Atom Atom Distance
H 3 O 1 2.00299
H 4 O 1 1.65067
H 4 C 2 0.00000
SCF CALCULATIONS = 4
WALL-CLOCK TIME = 0.309 SECONDS
COMPUTATION TIME = 0.033 SECONDS
FINAL GEOMETRY OBTAINED
SYMMETRY
Formaldehyde
O 0.00000000 +0 0.0000000 +0 0.0000000 +0 0 0 0
C 1.20614565 +1 0.0000000 +0 0.0000000 +0 1 0 0
H 1.09115836 +1 121.2760970 +1 0.0000000 +0 2 1 0
H 1.09115836 +0 121.2760970 +0 180.0000000 +0 2 1 3
3 1 4
3 2 4
I want to export the data in one csv and each data under each other like below
data1
444.38126 EV
-840.31531 EV
395.93406 EV
0.91931 = 0.45965 PER ATOM
2.66600
C2v
6
10.352991
-10.353 0.402
30.0262
60.70
42.52
I know how to read line by line each of the files. Lets assume the output file is output.txt
line_num=0
text=File.open('output.txt').read
text.gsub!(/\r\n?/, "\n")
text.each_line do |line|
print "#{line_num += 1} #{line}"
end
so it works to read it line by line, now i try to extract those info
line_num=0
text=File.open('output.txt').read
text.gsub!(/\r\n?/, "\n")
text.each_line do |line|
if line[/TOTAL ENERGY/]
puts line.split("=",2)[-1].strip
end
if line[/ELECTRONIC ENERGY/]
toggle=1
next
end
if line[/CORE-CORE REPULSION/]
toggle=1
next
if line[/GRADIENT NORM/]
toggle=1
next
if line[/DIPOLE/]
toggle=1
next
if line[/NO. OF FILLED LEVELS/]
toggle=1
next
if line[/IONIZATION POTENTIAL/]
toggle=1
next
if line[/HOMO LUMO ENERGIES (EV)/]
toggle=1
next
if line[/MOLECULAR WEIGHT /]
toggle=1
next
if line[/COSMO AREA/]
toggle=1
next
if line[/COSMO VOLUME/]
toggle=1
next
end