1

I want to merge two data frames on Date Time column dtype.date-time columns contain both similar and different values. But I am unable to merge them such that all unique date-time rows are finally there..with NA in uncommon columns. I am getting NAs in date_time column for 2nd data frame. tried both in R and python

python code:

df=pd.merge(df_met, df_so2, how='left', on='Date_Time')

In R..data_type is date-time using as.POSIXct

df_2<-join(so2, met_km, type="inner")
df3 <- merge(so2, met_km, all = TRUE)
df_4 <- merge(so2, met_km, by.x = "Date_Time", by.y = "Date_Time")

df_so2:

 X  POC  Datum        Date_Time          Date_GMT  Sample.Measurement  MDL
 1    2  WGS84  2015-01-01 3:00  01/01/2015 09:00                 2.3  0.2
 2    2  WGS84  2015-01-01 4:00  01/01/2015 10:00                 2.5  0.2
 3    2  WGS84  2015-01-01 5:00  01/01/2015 11:00                 2.1  0.2
 4    2  WGS84  2015-01-01 6:00  01/01/2015 12:00                 2.3  0.2
 5    2  WGS84  2015-01-01 7:00  01/01/2015 13:00                 1.1  0.2

df_met:

 X        Date_Time  air_temp_set_1  dew_point_temperature_set_1
 1  2015-01-01 1:00            35.6                         35.6
 2  2015-01-01 2:00            35.6                         35.6
 3  2015-01-01 3:00            35.6                         35.6
 4  2015-01-01 4:00            33.8                         33.8
 5  2015-01-01 5:00            33.2                         33.2
 6  2015-01-01 6:00            33.8                         33.8
 7  2015-01-01 7:00            33.8                         33.8

Expected Output:

 X  POC    Datum        Date_Time          Date_GMT  Sample.Measurement  MDL
 1  1.0  2 WGS84  2015-01-01 3:00  01/01/2015 09:00                 2.3  0.2
 2  2.0  2 WGS84  2015-01-01 4:00  01/01/2015 10:00                 2.5  0.2
 3  NaN      NaN  2015-01-01 1:00               NaN                 NaN  NaN
 4  NaN      NaN  2015-01-01 2:00               NaN                 NaN  NaN
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158

3 Answers3

1
merge(df_so2, df_met, by = "Date_Time", all = T)

        Date_Time X.x POC Datum         Date_GMT Sample.Measurement MDL X.y air_temp_set_1 dew_point_temperature_set_1
1 2015-01-01 1:00  NA  NA  <NA>             <NA>                 NA  NA   1           35.6                        35.6
2 2015-01-01 2:00  NA  NA  <NA>             <NA>                 NA  NA   2           35.6                        35.6
3 2015-01-01 3:00   1   2 WGS84 01/01/2015 09:00                2.3 0.2   3           35.6                        35.6
4 2015-01-01 4:00   2   2 WGS84 01/01/2015 10:00                2.5 0.2   4           33.8                        33.8
5 2015-01-01 5:00   3   2 WGS84 01/01/2015 11:00                2.1 0.2   5           33.2                        33.2
6 2015-01-01 6:00   4   2 WGS84 01/01/2015 12:00                2.3 0.2   6           33.8                        33.8
7 2015-01-01 7:00   5   2 WGS84 01/01/2015 13:00                1.1 0.2   7           33.8                        33.8
Jon Spring
  • 55,165
  • 4
  • 35
  • 53
0

merge on outer should get them all:

  • pandas.DataFrame.merge
  • outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically.
  • based upon your comment, you want all the dates, not just those shown in Expected Output
  • add the parameter, sort=True if you want them sorted by date
df_exp = pd.merge(df_so2, df_met, on='Date_Time', how='outer')

 X_x  POC  Datum        Date_Time          Date_GMT  Sample.Measurement  MDL  X_y  air_temp_set_1  dew_point_temperature_set_1
 1.0  2.0  WGS84  2015-01-01 3:00  01/01/2015 09:00                 2.3  0.2    3            35.6                         35.6
 2.0  2.0  WGS84  2015-01-01 4:00  01/01/2015 10:00                 2.5  0.2    4            33.8                         33.8
 3.0  2.0  WGS84  2015-01-01 5:00  01/01/2015 11:00                 2.1  0.2    5            33.2                         33.2
 4.0  2.0  WGS84  2015-01-01 6:00  01/01/2015 12:00                 2.3  0.2    6            33.8                         33.8
 5.0  2.0  WGS84  2015-01-01 7:00  01/01/2015 13:00                 1.1  0.2    7            33.8                         33.8
 NaN  NaN    NaN  2015-01-01 1:00               NaN                 NaN  NaN    1            35.6                         35.6
 NaN  NaN    NaN  2015-01-01 2:00               NaN                 NaN  NaN    2            35.6                         35.6

without columns from df_met:

df_exp.drop(columns=['X_y', 'air_temp_set_1', 'dew_point_temperature_set_1'], inplace=True)
df_exp.rename(columns={'X_x': 'X'}, inplace=True)

   X  POC  Datum        Date_Time          Date_GMT  Sample.Measurement  MDL
 1.0  2.0  WGS84  2015-01-01 3:00  01/01/2015 09:00                 2.3  0.2
 2.0  2.0  WGS84  2015-01-01 4:00  01/01/2015 10:00                 2.5  0.2
 3.0  2.0  WGS84  2015-01-01 5:00  01/01/2015 11:00                 2.1  0.2
 4.0  2.0  WGS84  2015-01-01 6:00  01/01/2015 12:00                 2.3  0.2
 5.0  2.0  WGS84  2015-01-01 7:00  01/01/2015 13:00                 1.1  0.2
 NaN  NaN    NaN  2015-01-01 1:00               NaN                 NaN  NaN
 NaN  NaN    NaN  2015-01-01 2:00               NaN                 NaN  NaN
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
0

df_exp = pd.merge(df_so2, df_met, on='Date_Time', how='outer')

I got:

 POC   Datum        Date_Time           Date_GMT   Sample.Measurement   MDL   air_temp_set_1   dew_point_temperature_set_1   relative_humidity_set_1   wind_speed_set_1   cloud_layer_1_code_set_1   wind_direction_set_1   pressure_set_1d   weather_cond_code_set_1   visibility_set_1  wind_cardinal_direction_set_1d  weather_condition_set_1d
    2  WGS84   2015-01-01 3:00  01/01/2015 09:00                   2.3   0.2             35.6                          35.6                     100.0                0.0                       14.0                    0.0         29.943333                       9.0               0.25                              N                       Fog
    1  WGS84   2015-01-01 3:00  01/01/2015 09:00                   0.6   2.0             35.6                          35.6                     100.0                0.0                       14.0                    0.0         29.943333                       9.0               0.25                              N                       Fog
    1  WGS84   2015-01-01 3:00  01/01/2015 12:00                   7.4   0.2             35.6                          35.6                     100.0                0.0                       14.0                    0.0         29.943333                       9.0               0.25                              N                       Fog
    1  WGS84   2015-01-01 3:00  01/01/2015 10:00                   1.0   0.2             35.6                           NaN                       NaN                NaN                        NaN                    NaN               NaN                       NaN                NaN                             NaN                      NaN

Notes:

  • Check df_met.info() and df_so2.info() and verify Date_Time is non-null datetime64[ns]
  • If not, try the following:
  • df_so2.Date_Time = pd.to_datetime(df_so2.Date_Time)
  • df_met.Date_Time = pd.to_datetime(df_met.Date_Time)
Nimantha
  • 6,405
  • 6
  • 28
  • 69