0

I have multiple csv files. For each file, a new column filename will be generated where it contains only numeric value. For eg. if filename is 20934_info.csv, filename column only extracts 20934. It will loop through all CSV files. How can it be done using python? Sample dataset:

    x   y
0   a   NaN
1   b   d

Expected output:

    x   y   filename
0   a   NaN 20934
1   b   d   20934

Code I tried:

extension = 'csv'
files = [i for i in glob.glob('*.{}'.format(extension))]
df = pd.concat([pd.read_csv(fp).assign(New=os.path.basename(fp).split('.')[0]) 
       for fp in files])
print (df)
Asif Exe
  • 111
  • 8
  • Are you getting data from the files? Or just the file names? If it was just names, you could use `df[filename] = int("".join(r'\d+', re.findall(, filepath)))` for each file path. See: https://stackoverflow.com/questions/29517072/add-column-to-dataframe-with-constant-value and https://stackoverflow.com/questions/4289331/how-to-extract-numbers-from-a-string-in-python – Larry the Llama Nov 29 '21 at 09:32
  • just filenames. Btw the code you provide gives SyntaxError: invalid syntax. @LarrytheLlama – Asif Exe Nov 30 '21 at 13:25
  • No problem - I wrote the code without checking it - it should be something like: `df[filename] = int("".join(re.findall(r"\d+", filepath)))` – Larry the Llama Nov 30 '21 at 19:56
  • Thanks! Now it works for a single csv file. How can I run a loop for multiple csv files? – Asif Exe Nov 30 '21 at 20:21
  • `for filename in filenames: ...` if you have all the filenames ready – Larry the Llama Nov 30 '21 at 20:23

0 Answers0