0

I have location data in one data frame (y), and weather data in another data frame (weather).

I want to merge the weather data with the y data frame, but only for times and dates that have a corresponding row in y.

I've tried merge and rbind, and I either get an empty data frame, or one with millions of entries when there should be ~7000

names(y)
 [1] "ID"         "Year"       "Month"      "Day"        "Time"       "Source"    
 [7] "Source.Lat" "Source.Lon" "Target"     "Target.Lat" "Target.Lon"

names(weather)
 [1] "Target"     "Year"       "Month"      "Day"        "Time"          
 [6] "Temp"       "Dew_Point_Temp" "Humidity"   "Wind_Direction" "Wind_Speed"    
[11] "Pressure"   "Humidex" 

all.data <- merge(y, weather, by = c("Target","Year","Month","Day","Time"))

I would like to populate the weather data in y only when Target, Year, Month, Day, and Time match, and disregard the rest.

Sample data (y):

    ID      Year  Month Day Time    Target      Lat     Lon 
1   35624   2019    06  19  11:00   Kejimkujik  46.3236 -114.1319
3   35651   2019    06  19  14:00   CNSC 2019   58.7378 -93.8194
5   35620   2019    06  19  14:00   CNSC 2019   58.7378 -93.8194
7   35624   2019    06  20  04:00   CNSC 2019   58.7378 -93.8194
9   35651   2019    06  20  05:00   CNSC 2019   58.7378 -93.8194

Sample data (weather)

    Target      Year Month Day Time    Temp DP  Hum WD  WS  Pressure
1   Kejimkujik  2019    6   1   0:00    6.5 6.1 97  32  3   99.51   
2   Kejimkujik  2019    6   1   1:00    5.9 5.6 98  30  2   99.50   
3   Kejimkujik  2019    6   1   2:00    4.9 4.7 98  31  3   99.52   
4   Kejimkujik  2019    6   1   3:00    4.4 4.3 99  32  3   99.52   
5   Kejimkujik  2019    6   1   4:00    4.1 4.0 99  24  3   99.57   
  • 1
    You're describing an inner-join, which is what the `merge` line you wrote does, so no issues there. If it isn't working there's some issue with your data, which can't be identified unless you share the data (or example data which reproduces the issue) – IceCreamToucan Oct 03 '19 at 12:31
  • Looks like `Month` has different type in `y` than in `weather`. – GKi Oct 03 '19 at 12:41
  • @GKi Is there an easy way to correct this? I'm an r baby – Courtney lR Oct 03 '19 at 12:48
  • `y$Month <- as.numeric(y$Month)` in case `Month` is numeric in `weather` – GKi Oct 03 '19 at 12:50

1 Answers1

0

In case y holds not unique information you can use:

all.data <- merge(unique(y[("Target","Year","Month","Day","Time")]), weather)

In case Month has different type in y than in weather and Month is numeric in weather try:

y$Month <- as.numeric(y$Month)

and then use merge.

GKi
  • 37,245
  • 2
  • 26
  • 48