1

I have multiple csv files in a folder. I want to add "date_created" as an variable to my dataframe for each csv file. Currently I have something like this:

import glob
import pandas as pd

df = pd.concat([pd.read_csv(f, encoding="utf-16", delimiter = "^") for f in glob.glob('*.csv')])
df.to_csv("all_together.csv")

How could I do this?

edyvedy13
  • 2,156
  • 4
  • 17
  • 39

1 Answers1

1

Use assign with custom function:

import os
import platform

#https://stackoverflow.com/a/39501288
def creation_date(path_to_file):
    """
    Try to get the date that a file was created, falling back to when it was
    last modified if that isn't possible.
    See http://stackoverflow.com/a/39501288/1709587 for explanation.
    """
    if platform.system() == 'Windows':
        return os.path.getctime(path_to_file)
    else:
        stat = os.stat(path_to_file)
        try:
            return stat.st_birthtime
        except AttributeError:
            # We're probably on Linux. No easy way to get creation dates here,
            # so we'll settle for when its content was last modified.
            return stat.st_mtime

L = [pd.read_csv(f, encoding="utf-16", delimiter = "^").assign(date_created=creation_date(f)) 
               for f in glob.glob('*.csv')]
df = pd.concat(L, ignore_index=True)
df.to_csv("all_together.csv")
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252