I have multiple csv files. For each file, a new column filename
will be generated where it contains only numeric value. For eg. if filename is 20934_info.csv
, filename
column only extracts 20934
. It will loop through all CSV files. How can it be done using python?
Sample dataset:
x y
0 a NaN
1 b d
Expected output:
x y filename
0 a NaN 20934
1 b d 20934
Code I tried:
extension = 'csv'
files = [i for i in glob.glob('*.{}'.format(extension))]
df = pd.concat([pd.read_csv(fp).assign(New=os.path.basename(fp).split('.')[0])
for fp in files])
print (df)