1

I have some code that looks like this:

import pandas as pd
import glob

root_directory = r"\\some\shared\directory"

all_files = glob.glob(f'{root_directory}\CPE_*.csv')

li = []

for filename in all_files:
    frame = pd.read_csv(filename, index_col=None, header=0, encoding='latin1')
    li.append(frame)

df = pd.concat(li, axis=0, ignore_index=True)

This code allows me to concatenate the data and create a master csv file, but I want to add a new column to each dataframe as I loop through them. The file names look something like this: CPE_02082020.csv , CPE_02092020 , etc. So the date is in the file name itself with the format of mmddyyyy. How do I add a date column to each file as I loop through and concatenate them?

2 Answers2

3

To add a date column to each dataframe you can simply add a new column to frame while looping through all_files. This method was suggested by biobirdman on a previous post. You can then use split() to get just the date from the filename.

Try:

for filename in all_files:
    frame = pd.read_csv(filename, index_col=None, header=0, encoding='latin1')
    frame['date'] = filename.split('_')[1]
    li.append(frame)

df = pd.concat(li, axis=0, ignore_index=True)

Then to convert df['date'] to datetime add:

df['date'] = pd.to_datetime(df['date'])
Blake
  • 103
  • 1
  • 5
0

You can assign the column during a for loop, let's simplify a little using a list comprehension and Pathlib:

import pandas as pd 
from pathlib import Path 

root_dir = Path('\\some\shared\directory')

all_files = root_dir.glob('*CPE_*.csv')

dfs = [
    pd.read_csv(f, index_col=None, header=0, encoding='latin1')\
            .assign(date_col=
                pd.to_datetime(f.stem.split('_')[1],format='%m%d%Y',errors='coerce'))
    for f in all_files
]


print(pd.concat(dfs))

enter image description here

halfer
  • 19,824
  • 17
  • 99
  • 186
Umar.H
  • 22,559
  • 7
  • 39
  • 74