0

I have tried searching other posts on here, but can't seem to solve this problem. I have a CSV file in which Year, Crash_Month, Crash_Day and Crash_Time are all seperate columns in the CSV 'data_dict'. I am trying to solve the below question. How would I go about this? I have tried to use a data frame, and pandas convert to datetime, but I'm not sure if this is the right approach. Many thanks

Here's the data frame I'm trying to assign datetime to

    year  month  day       time
0   2000      1    1   4:30:59 
1   2000      1    1   0:07:35 
2   2000      1    1   4:51:37 
3   2000      1    1   4:27:56 
4   2000      1    1   2:16:31 
5   2000      1    1   0:37:21 
6   2000      1    1   0:52:57 
7   2000      1    1   3:35:14 
8   2000      1    1   2:41:58 
9   2000      1    1   3:43:02 
10  2000      1    1   3:49:19 
11  2000      1    1   3:03:55 
12  2000      1    1   4:46:01 
13  2000      1    1   1:07:24 
14  2000      1    1   8:29:04 
15  2000      1    1   6:35:21 
16  2000      1    1   6:06:25 
17  2000      1    1   7:10:13 
18  2000      1    1   10:57:24 
19  2000      1    1   7:54:38

So far, I have coded this.

import pandas as pd

df = pd.DataFrame({'year': (data_dict['Year']),
                   'month': (data_dict['Crash_Month']),
                   'day': (data_dict['Crash_Day']),
                   'time': (data_dict['Crash_Time'])})

date=pd.to_datetime(df[["year", "month", "day", "time"]],format='%YYYY%mm%dd, %HH%MM%SS')
print(date)

day_of_week = {0 : 'Monday',
              1: 'Tuesday',
              2: 'Wednesday',
              3: 'Thursday',
              4: 'Friday',
              5: 'Saturday',
              6: 'Sunday'}

month_season= {1: 'Summer',
              2: 'Summer',
              3: 'Autumn',
              4: 'Autumn',
              5: 'Autumn',
              6: 'Winter',
              7: 'Winter',
              8: 'Winter',
              9: 'Spring',
              10: 'Spring',
              11: 'Spring',
              12: 'Summer'}
nickp
  • 43
  • 6
  • 1
    can you provide a sample of your data? 5 rows will do – Umar.H Apr 22 '20 at 01:36
  • Hi @nickp please provide a [mcve](/help/mcve). – rpanai Apr 22 '20 at 01:39
  • 2
    Please [provide a reproducible copy of the DataFrame with `to_clipboard`](https://stackoverflow.com/questions/52413246/provide-a-reproducible-copy-of-the-dataframe-with-to-clipboard/52413247#52413247). [Stack Overflow Discourages Screenshots](https://meta.stackoverflow.com/questions/303812/discourage-screenshots-of-code-and-or-errors). It is likely the question will be downvoted. You are discouraging assistance because no one wants to retype your data or code, and screenshots are often illegible. – Trenton McKinney Apr 22 '20 at 01:43
  • Heres the first few rows of the CSV file – nickp Apr 22 '20 at 01:49
  • Index,Age,Year,Crash_Month,Crash_Day,Crash_Time,Road_User,Gender,Crash_Type,Injury_Severity,Crash_LGA,Crash_Area_Type 1,37,2000,1,1,4:30:59,PEDESTRIAN,MALE,UNKNOWN,1,MARIBYRNONG,MELBOURNE 2,22,2000,1,1,0:07:35,DRIVER,MALE,ADJACENT DIRECTION,1,YARRA,MELBOURNE 3,47,2000,1,1,4:51:37,DRIVER,FEMALE,ADJACENT DIRECTION,0,YARRA,MELBOURNE 4,70,2000,1,1,4:27:56,DRIVER,MALE,ADJACENT DIRECTION,1,BANYULE,MELBOURNE 5,16,2000,1,1,2:16:31,PASSENGER,MALE,OPPOSING DIRECTION,0,MONASH,MELBOURNE – nickp Apr 22 '20 at 01:49
  • Here's the data frame I'm trying to assign datetime to – nickp Apr 22 '20 at 01:55
  • year,month,day,time 2000,1,1,4:30:59 2000,1,1,0:07:35 2000,1,1,4:51:37 2000,1,1,4:27:56 2000,1,1,2:16:31 2000,1,1,0:37:21 2000,1,1,0:52:57 2000,1,1,3:35:14 2000,1,1,2:41:58 2000,1,1,3:43:02 2000,1,1,3:49:19 2000,1,1,3:03:55 2000,1,1,4:46:01 2000,1,1,1:07:24 2000,1,1,8:29:04 2000,1,1,6:35:21 2000,1,1,6:06:25 2000,1,1,7:10:13 2000,1,1,10:57:24 2000,1,1,7:54:38 – nickp Apr 22 '20 at 01:56
  • @Datanovice done, thanks – nickp Apr 22 '20 at 02:06

1 Answers1

1

We can use str.zfill and string concenation with pd.to_datetime to build up your datetime.

df2['date'] = pd.to_datetime(df2['year'].astype(str) 
               + df2['month'].astype(str).str.zfill(2)
               + df2['day'].astype(str).str.zfill(2)
               + ' '
               + df2['time'].astype(str),format='%Y%m%d %H:%M:%S'
              )

    year  month  day       time                date
0   2000      1    1   4:30:59  2000-01-01 04:30:59
1   2000      1    1   0:07:35  2000-01-01 00:07:35
2   2000      1    1   4:51:37  2000-01-01 04:51:37
3   2000      1    1   4:27:56  2000-01-01 04:27:56
4   2000      1    1   2:16:31  2000-01-01 02:16:31
5   2000      1    1   0:37:21  2000-01-01 00:37:21
6   2000      1    1   0:52:57  2000-01-01 00:52:57
7   2000      1    1   3:35:14  2000-01-01 03:35:14
8   2000      1    1   2:41:58  2000-01-01 02:41:58
9   2000      1    1   3:43:02  2000-01-01 03:43:02
10  2000      1    1   3:49:19  2000-01-01 03:49:19
11  2000      1    1   3:03:55  2000-01-01 03:03:55
12  2000      1    1   4:46:01  2000-01-01 04:46:01
13  2000      1    1   1:07:24  2000-01-01 01:07:24
14  2000      1    1   8:29:04  2000-01-01 08:29:04
15  2000      1    1   6:35:21  2000-01-01 06:35:21
16  2000      1    1   6:06:25  2000-01-01 06:06:25
17  2000      1    1   7:10:13  2000-01-01 07:10:13
18  2000      1    1  10:57:24  2000-01-01 10:57:24
19  2000      1    1   7:54:38  2000-01-01 07:54:38
Umar.H
  • 22,559
  • 7
  • 39
  • 74
  • thanks. However, this did not work- there are a few records with the string value "UNKNOWN", which is presenting an error for the whole function – nickp Apr 22 '20 at 03:20
  • 1
    you add in argument in your `pd.to_datetime(...,errors='coerce')` @nickp – Umar.H Apr 22 '20 at 03:23