0

I'm using python to merge some excel files into a single csv file, but when doing so, the datetimes get turned into integers. So, when I read it back with pandas to treat my unified database, I would need to convert it back to datetime, which is possible but seems unnecessary. The code for reading and compiling the files:

folder = Path('myPath')
os.chdir(folder)
files = sorted(os.listdir(os.getcwd()), key = os.path.getctime)

for file in files:
    with xlrd.open_workbook(folder/file) as wb:
        sh = wb.sheet_by_index(0)
        with open('Unified database.csv', 'wb') as f:
            c = csv.writer(f, encoding = 'utf-8')
            for r in range(sh.nrows):
                c.writerow(sh.row_values(r))

Is there a way to take less steps into solving this problem, and just write the datetime columns as strings, which pandas has a much easier time automatically identifying as dates? Even if I have to pass the datetime columns manually.

lczapski
  • 4,026
  • 3
  • 16
  • 32

1 Answers1

0

Have you tried to read all of the excel files directly into a pandas dataframe? The code below is from this answer on how to Import multiple csv files into pandas and concatenate into one DataFrame. I have added the dtype so you can specify which columns should be datetime.

import pandas as pd
import glob

path = r'C:\DRO\DCL_rawdata_files' # use your path
all_files = glob.glob(path + "/*.xlsx")

li = []

for filename in all_files:
    df = pd.read_xlsx(filename, index_col=None, header=0, dtype={‘a’: np.datetime})
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)
Rick
  • 347
  • 4
  • 16