0

I'm trying to gather data to test the impact of rainy weather on traffic congestion. I have two different data frames:

DF no. 1: Based on a 4-hours time frame, this shows the congestion statistics on a 6-hours block in which I have indicated with 1/0 of whether it is congested or not.

|date_time            | congestion_YN |
|---------------------| ------------- |
|2022/01/03  00:00:00 |  1            |
|2022/01/03  06:00:00 |  1            |
|2022/01/03  12:00:00 |  0            |
|2022/01/03  18:00:00 |  1            |
|2022/01/04  00:00:00 |  0            |
|2022/01/04  06:00:00 |  0            |
...

DF no. 2: Based on a daily time frame, this shows whether a particular day is a rainy day or not, also use 1/0

|date_time            | rainy_day_YN  |
|---------------------| ------------- |
|2022/01/03  00:00:00 |  1            |
|2022/01/04  00:00:00 |  0            |
|2022/01/05  00:00:00 |  1            |
...

I want to combine DF2 to DF1 so that I can see which 6-hours time block satisfies both conditions, the end DF I need should look like this

|date_time            | congestion_YN | rainy_day_YN |
|---------------------| ------------- | ------------ |
|2022/01/03  00:00:00 |  1            | 1            |
|2022/01/03  06:00:00 |  1            | 1            |
|2022/01/03  12:00:00 |  0            | 1            |
|2022/01/03  18:00:00 |  1            | 1            |
|2022/01/04  00:00:00 |  0            | 0            |
|2022/01/04  06:00:00 |  0            | 0            |
...

I just start learning python a few months ago and this is my first project. I don't know where to start or what to look for. Any helps would be greatly appreciated.

HaoAreYou
  • 3
  • 2

1 Answers1

1

First you need to make sure the date_time columns are indeed in datetime type, if they are not, then (reference)

df1['date_time'] = pd.to_datetime(df1['date_time'])
df2['date_time'] = pd.to_datetime(df2['date_time'])

Then you may add the desired column by

df1['rainy_day_YN'] = df1['date_time'].dt.date.map(df2.set_index('date_time')['rainy_day_YN'])

df1['date_time'].dt.date takes the date component out

map(mapping) function tries to map each date to another value as decided by the mapping

df2.set_index('date_time')['rainy_day_YN'] is our mapping. It is a Series using its index (which is date_time) as the key of the mapping, and using the Series' value (which is rainy_day_YN) as the value of the mapping.

You may also use pd.merge to do what the last line of code does, and that should require you to add a dummy date column in df1.

Raymond Kwok
  • 2,461
  • 2
  • 9
  • 11