For Part 1, I have multiple csv files which I loop through to create new csv files with just summary statistics (medians). The new csv files have the original filename + 'summary_' at the start. This part is okay.
For Part 2, I want to concatenate all of the 'summary_' files (they have the same column names as each other), but have the row names in the concatenated dataframe the same as the name of the respective 'summary_' csv file where the data comes from.
With stackoverflow's help, I have solved Part 1, but not Part 2 yet. I can concatenate all of the csv files, but not just the ones with 'summary_' in the name (i.e. the new csv's created in Part 1), and not with the correct row names...
import os
import pandas as pd
import glob
## Part 1
summary_stats = ['median']
filenames = (filename for filename in os.listdir(os.curdir) if os.path.splitext(filename)[1] == '.csv')
for filename in filenames:
df = pd.read_csv(filename, )
summary_df = df.agg(summary_stats)
summary_df.to_csv(f'summary_{filename}')
## Part 2
path = r'/Users/Desktop/Practice code'
all_files = glob.glob(path + "/*.csv")
list = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
list.append(df)
frame = pd.concat(list, axis=0, ignore_index=True)