2

I am trying to merge two dataframes by date in R.
The first dataframe records daily temperatures. It has only 28 rows, and no dates are repeated.

head(df1)

Day MaxTemp MinTemp
2019-06-15 23.8 14.4
2019-06-16 24.9 11.7
2019-06-17 23.2 8.7

The second dataframe records hourly temperatures, and so has many more rows, with dates repeated.

head(df2)

Day Hour Temp
2019-06-15 14 22.8
2019-06-15 15 22.4
2019-06-15 16 21.9

I would like to merge the data to look something like this:

Day MaxTemp MinTemp Hour Temp 
2019-06-15 14 22.8 23.8 14.4
2019-06-15 15 22.4 23.8 14.4
2019-06-15 16 21.9 23.8 14.4

But what I end up with is:

allData <-merge(df1, df2, by="Day", all.y=T)

head(allData)

Day Hour Temp MaxTemp MinTemp
2019-06-15 14 22.8 NA NA
2019-06-15 15 22.4 NA NA
2019-06-15 16 21.9 NA NA

Or if I try "all = T" in the arguments I get "Error in x[[n]][i] <- value[[n]] : replacement has length zero".
Does anyone have any idea how I can fix this?

Edit:

# head of df1

df1 <- structure(list(Day = structure(list(sec = c(0, 0, 0, 0, 0, 0), 
min = c(0L, 0L, 0L, 0L, 0L, 0L), hour = c(0L, 0L, 0L, 0L, 
0L, 0L), mday = 15:20, mon = c(5L, 5L, 5L, 5L, 5L, 5L), year = c(119L, 
119L, 119L, 119L, 119L, 119L), wday = c(6L, 0L, 1L, 2L, 3L, 
4L), yday = 165:170, isdst = c(1L, 1L, 1L, 1L, 1L, 1L), zone = c("CDT", 
"CDT", "CDT", "CDT", "CDT", "CDT"), gmtoff = c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
)), class = c("POSIXlt", "POSIXt")), Max = c(23.8, 24.9, 23.2, 22.4, 25.1, 24.4), Min = c(14.4, 11.7, 8.7, 8.7, 9.8, 10)), row.names = c(NA, 6L), class ="data.frame")

# head of df2
df2 <- structure(list(Date = structure(list(sec = c(0, 0, 0, 0, 0, 0), 
min = c(0L,30L, 0L, 30L, 0L, 30L), hour = c(14L, 14L, 15L, 15L, 16L, 16L),
mday = c(15L, 15L, 15L, 15L, 15L, 15L), mon = c(5L, 5L, 5L, 5L, 5L, 5L), 
year = c(119L, 119L, 119L, 119L, 119L, 119L), wday = c(6L, 6L, 6L, 6L, 6L,
6L), yday = c(165L,165L, 165L, 165L, 165L, 165L), isdst = c(1L, 1L, 1L, 1L,
1L, 1L), zone =c("CDT", "CDT", "CDT", "CDT", "CDT", "CDT"), gmtoff = 
c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_)),class = c("POSIXlt","POSIXt")), Temp = c(22.8, 22.4, 22.4, 
22.3,21.9, 21.3), Hour =c(14L, 14L, 15L, 15L, 16L, 16L), Day =
structure(c(18062,18062, 18062, 18062, 18062, 18062), class = "Date")), 
row.names= c(NA, 6L), class = "data.frame")
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • 1
    Rather than sharing `head(df1)`, can you share `dput(head(df1))`? That will tell us exactly how your values are coded and will let us copy/paste the code into R to run it ourselves. – MrFlick Aug 17 '20 at 20:25
  • It might be a dupe of something, but it isn't a dupe of *that* question: the only mention of `class` in that question and its answers is in the benchmarking, not as a problem. The question has an otherwise correct use of `merge` and expectation of its return value. – r2evans Aug 17 '20 at 21:05
  • My apologies if I mis-interpreted the question/dupe-status, it was not intentional. – r2evans Aug 17 '20 at 21:08
  • I'm not reopening it to spite you, akrun, just identifying *why* I reopened it despite your closure. – r2evans Aug 17 '20 at 21:09
  • here also, it is not intentional. I found the merge as a very commonly asked question. Correct me, if I am wrong. I don't usually check who wrote an answer before I dupe tag – akrun Aug 17 '20 at 21:09
  • @r2evans I'd say it's a typo. Similar to this old question: [Merging two data frames by dates that have different types](https://stackoverflow.com/q/25128173) – M-- Aug 20 '20 at 18:37

1 Answers1

3

Confirmed with your dput output:

class(df1$Day)
# [1] "POSIXlt" "POSIXt" 
class(df2$Day)
# [1] "Date"

You need to convert one to the other's class, perhaps df1$Day is the same time-of-day for each value (in this set), then you can do

merge(df1, df2, by = "Day", all.y = TRUE)
#          Day Max Min                Date Temp Hour
# 1 2019-06-15  NA  NA 2019-06-15 14:00:00 22.8   14
# 2 2019-06-15  NA  NA 2019-06-15 14:30:00 22.4   14
# 3 2019-06-15  NA  NA 2019-06-15 15:00:00 22.4   15
# 4 2019-06-15  NA  NA 2019-06-15 15:30:00 22.3   15
# 5 2019-06-15  NA  NA 2019-06-15 16:00:00 21.9   16
# 6 2019-06-15  NA  NA 2019-06-15 16:30:00 21.3   16

df1$Day <- as.Date(df1$Day)

merge(df1, df2, by = "Day", all.y = TRUE)
#          Day  Max  Min                Date Temp Hour
# 1 2019-06-15 23.8 14.4 2019-06-15 14:00:00 22.8   14
# 2 2019-06-15 23.8 14.4 2019-06-15 14:30:00 22.4   14
# 3 2019-06-15 23.8 14.4 2019-06-15 15:00:00 22.4   15
# 4 2019-06-15 23.8 14.4 2019-06-15 15:30:00 22.3   15
# 5 2019-06-15 23.8 14.4 2019-06-15 16:00:00 21.9   16
# 6 2019-06-15 23.8 14.4 2019-06-15 16:30:00 21.3   16

I'll go out on a limb and say that the class of your Day columns is different.

Going with "raw data" as copied from the question, Day will be strings for both frames:

df1 <- read.table(header = TRUE, text = "
Day MaxTemp MinTemp
2019-06-15 23.8 14.4
2019-06-16 24.9 11.7
2019-06-17 23.2 8.7")

df2 <- read.table(header = TRUE, text = "
Day Hour Temp
2019-06-15 14 22.8
2019-06-15 15 22.4
2019-06-15 16 21.9")

str(lapply(df1, class))
# List of 3
#  $ Day    : chr "character"
#  $ MaxTemp: chr "numeric"
#  $ MinTemp: chr "numeric"
merge(df1, df2, by = "Day")
#          Day MaxTemp MinTemp Hour Temp
# 1 2019-06-15    23.8    14.4   14 22.8
# 2 2019-06-15    23.8    14.4   15 22.4
# 3 2019-06-15    23.8    14.4   16 21.9

If I convert one of them to a Date class:

df1$Day <- as.Date(df1$Day)
str(lapply(df1, class))
# List of 3
#  $ Day    : chr "Date"
#  $ MaxTemp: chr "numeric"
#  $ MinTemp: chr "numeric"

merge(df1, df2, by = "Day", all.y = TRUE)
#          Day MaxTemp MinTemp Hour Temp
# 1 2019-06-15      NA      NA   14 22.8
# 2 2019-06-15      NA      NA   15 22.4
# 3 2019-06-15      NA      NA   16 21.9

Fixes include:

  1. Converting the other frame's Day to a date:

    df2$Day <- as.Date(df2$Day)
    merge(df1, df2, by = "Day", all.y = TRUE)
    #          Day MaxTemp MinTemp Hour Temp
    # 1 2019-06-15    23.8    14.4   14 22.8
    # 2 2019-06-15    23.8    14.4   15 22.4
    # 3 2019-06-15    23.8    14.4   16 21.9
    
  2. Converting both Day columns back to character (or factor):

    df1$Day <- as.character(df1$Day)
    df2$Day <- as.character(df2$Day)
    merge(df1, df2, by = "Day", all.y = TRUE)
    #          Day MaxTemp MinTemp Hour Temp
    # 1 2019-06-15    23.8    14.4   14 22.8
    # 2 2019-06-15    23.8    14.4   15 22.4
    # 3 2019-06-15    23.8    14.4   16 21.9
    

    Though in this case it's likely (and perhaps even recommended) that you convert them back to Date at some point (since it is a numeric data type, after all).

r2evans
  • 141,215
  • 6
  • 77
  • 149