0

I have several .txt files in the form /folder/blahblah_*K.txt, where the asterisk represents a temperature in degrees Kelvin. Each file contains 3 columns (let's call them 'A', 'B' and 'C'). I would like to create a single DataFrame with a common index (the first column, call it 'A') and the 'B' column from each of the files.

I've gotten as far as a list of DataFrames where each value in the list is an entire set of data (i.e each value in list contains all the values of each *.txt file).

In the desired DataFrame, I would like to designate each column of 'B' data with the temperature represented by the * in the file name.

My approach so far is:

files = glob.glob("folder/blahblah*K.txt")

dataframes = []

for f in files:
    dataframes.append(pd.read_csv(f, sep='\t'))

dataframes_df = pd.DataFrame(dataframes)

Is there a way to accomplish these tasks? Is there a more efficient approach?

  • Some hints: you've got the name of the file (`f` variable), so you could extract the temperature from here using regular expressions. Also, look into `pandas.concat()`, `pandas.read_table()` (instead of using `sep`) and assigning variable to columns. – krassowski Jan 29 '19 at 21:44

1 Answers1

0

You can use concat to combine all the dataframes into one. Something like:

files = glob.glob("folder/blahblah*K.txt")

dataframes = []

for f in files:
    dataframes.append(pd.read_csv(f, sep='\t')['B'])

dataframes_df = pd.concat(dataframes, axis=1, keys=['A'])

I have not tested this but it should give you an idea what to do.

Documentation on Merge, concat and append

mbass
  • 95
  • 2
  • 10