change date format from filename and join into hourly data in multiple csv files

Question

I have a multiple of csv files whose name indicate date like

"cd191108.csv", "cd191120.csv"

And csv files contains data like this way:

GMT + TZ;Value
10:43:00;10
10:45:00;20
...

What I want to do is merge all csv files into one dataframe in pandas but with 'time' columns indicating date from filename and hour from contents of a file like

Time;value
2019-11-08 10:43:00;10

I made it through like the following:

import os
import pandas as pd
path = os.getcwd()
files = os.listdir(path)

files_csvf = [f for f in files if f[-3:] == 'csv']
files_csv

dfs=[]
for f in files_csv:
    data = pd.read_csv(f,sep=";",index_col=False)
    data['datetime']=pd.to_datetime(f[2:8])+pd.to_timedelta(data['GMT + TZ'])
    data=data.drop('GMT + TZ',axis=1)
    dfs.append(data)
df=pd.concat(dfs,ignore_index=True)
df

(refered to How to add a date from filename to a time column to make datetime column? Python Pandas)

However, the result of 'Time' columns is following:

2008-11-19 10:43:00

instead of what I expected

2019-11-08 10:43:00

I tried with "to_datetime" to correct but, it does not work..

df['Time'] =  pd.to_datetime(df['Time']

pass `yearfirst=True) so `df['Time'] = pd.to_datetime(df['Time'], yearfirst=True)` — David Erickson, Jul 17 '20 at 23:15
Thanks, David! I've also realized df['Time'] = pd.to_datetime(df['Time'], format='%y%m%d') work as well! So happy to solve my first job ;) — blackbeaans, Jul 17 '20 at 23:23

smci · Accepted Answer · 2020-09-01T00:55:00.857

0

If the first line of your date is 2008-11-19 10:43:00 instead but you expected 2019-11-08 10:43:00 then it apparently wrongly ate an extra row as header, so try tweaking:

pd.read_csv(..., header=[0])`

to make it not eat the first/second row.

Note that's the list [0] not the number 0.

edited Sep 01 '20 at 00:55

answered Jul 17 '20 at 23:25

smci

32,567
20
113
146

change date format from filename and join into hourly data in multiple csv files

1 Answers1