So I have several csv files that represent some data, each of which may have different lines of initial comments
table_doi: 10.17182/hepdata.52402.v1/t7
name: Table 7
...
ABS(YRAP), < 0.1
SQRT(S) [GeV], 1960
PT [GEV], PT [GEV] LOW, PT [GEV] HIGH, D2(SIG)/DYRAP/DPT [NB/GEV]
67, 62, 72, 6.68
...
613.5, 527, 700, 1.81E-07
I would like to read in only the relevant data and their headers as well, which start from the line
PT [GEV], PT [GEV] LOW, PT [GEV] HIGH, D2(SIG)/DYRAP/DPT [NB/GEV]
Therefore the strategy I would think of is to find the pattern PT [GEV]
and start reading from there.
However, I am not sure how to achieve this in Python, could anyone help me on that?
Thank you in advance!
By the way, the function I currently have is
import os
import glob
import csv
def read_multicolumn_csv_files_into_dictionary(folderpath, dictionary):
filepath = folderpath + '*.csv'
files = sorted(glob.glob(filepath))
for file in files:
data_set = file.replace(folderpath, '').replace('.csv', '')
dictionary[data_set] = {}
with open(file, 'r') as data_file:
data_pipe = csv.DictReader(data_file)
dictionary[data_set]['pt'] = []
dictionary[data_set]['sigma'] = []
for row in data_pipe:
dictionary[data_set]['pt'].append(float(row['PT [GEV]']))
dictionary[data_set]['sigma'].append(float(row['D2(SIG)/DYRAP/DPT [NB/GEV]']))
return dictionary
which only works if I manually delete those initial comments in the csv files.