1

I have two distinct dataframes, both with a time information column, with different time intervals. The first df1 has time intervals in seconds (~6s) and the other (df2) has time intervals of 10min. I would like to merge both dataframes, keeping the information from both df, repeating df2 values within the time range in df1. Like this:

df1

  x   y   z     time
-52 -39 -35 06:08:03
-47 -57 -36 06:08:08
-39   2 -40 06:08:13
-45 -23 -29 06:10:20
-51 -11 -31 06:10:29
-69 -28 -19 06:20:34

df2

time        Temp.ar  Ur ar  Vel. Vento
06:00:00    14.79    78.5   1.147
06:10:00    14.74    78.9   1.045
06:20:00    14.9     78.9   1.009
06:30:00    15.14    78.6   1.076
06:40:00    15.32    77.8   1.332
06:50:00    15.6     76.5   1.216   

output that I want

 x   y   z      time  Temp.ar   Ur ar   Vel. Vento
-52 -39 -35 06:08:03  14.79     78.5    1.147
-47 -57 -36 06:08:08  14.79     78.5    1.147
-39   2 -40 06:08:13  14.79     78.5    1.147
-45 -23 -29 06:10:20  14.74     78.9    1.045
-51 -11 -31 06:10:29  14.74     78.9    1.045
-69 -28 -19 06:20:34  14.9      78.9    1.009

Time column is already in "POSIXct" format.

  • What mapping do you want to use? Is any time from `06:00:00` to `06:10:00` in `df1` a match to the `06:00:00` time in `df2`? I guess that depends on whether `df2` has point estimates of location or an average and, if an average, over what times. – David_O Dec 11 '19 at 12:53
  • Provide a proper [reprex](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). It helps who is answering your question and, so, it helps you too. – Gabriel M. Silva Dec 11 '19 at 12:56
  • No, the time in df1 starts at a different time than df2, so I would have to take intervals, for example from 06:00:00 until 06:10:00 I must repeat the values of the other variables (Temp.ar, Ur ar and Vel. Vento) from 06:10:00 to df2, and so on. – Rafael Nakamura Dec 11 '19 at 13:06

3 Answers3

2

Probably the most generalizable approach is to define a set of time windows and then use findInterval to locate the index of the time in each data frame. You can then use merge to bring the two together:

# This is what Gabriel means by a reprex - if you provide the data in 
# loadable form it is much easier to help
df1 <- read.table(text="  x   y   z     time
-52 -39 -35 06:08:03
-47 -57 -36 06:08:08
-39   2 -40 06:08:13
-45 -23 -29 06:10:20
-51 -11 -31 06:10:29
-69 -28 -19 06:20:34", header=TRUE, stringsAsFactors=FALSE)

df2 <- read.table(text="time        Temp.ar  Ur.ar  Vel.Vento
06:00:00    14.79    78.5   1.147
06:10:00    14.74    78.9   1.045
06:20:00    14.9     78.9   1.009
06:30:00    15.14    78.6   1.076
06:40:00    15.32    77.8   1.332
06:50:00    15.6     76.5   1.216", header=TRUE, stringsAsFactors=FALSE)

df1$time <- strptime(df1$time, '%H:%M:%S')
df2$time <- strptime(df2$time, '%H:%M:%S')

# I'm just using the existing sequence in df2 as the time windows, but
# you could set up different ones
df1$interval <- findInterval(df1$time, df2$time)
df2$interval <- findInterval(df2$time, df2$time)

df3 <- merge(df1, df2, by='interval')

There are some extra columns in there (the times from both df1 and df2) but you can subset those out. They are a useful check it has worked though.

David_O
  • 1,143
  • 7
  • 16
2

You can use a rolling join

library(data.table)
setDT(df1)
setDT(df2)

df2[df1, on = .(time), roll = TRUE]

#                   time Temp.ar Ur.ar Vel.Vento   x   y   z
# 1: 2019-12-11 06:08:03   14.79  78.5     1.147 -52 -39 -35
# 2: 2019-12-11 06:08:08   14.79  78.5     1.147 -47 -57 -36
# 3: 2019-12-11 06:08:13   14.79  78.5     1.147 -39   2 -40
# 4: 2019-12-11 06:10:20   14.74  78.9     1.045 -45 -23 -29
# 5: 2019-12-11 06:10:29   14.74  78.9     1.045 -51 -11 -31
# 6: 2019-12-11 06:20:34   14.90  78.9     1.009 -69 -28 -19

Data used

df1 <- fread('
x   y   z     time
-52 -39 -35 06:08:03
-47 -57 -36 06:08:08
-39   2 -40 06:08:13
-45 -23 -29 06:10:20
-51 -11 -31 06:10:29
-69 -28 -19 06:20:34
')

df2 <- fread('
time        Temp.ar  Ur.ar  Vel.Vento
06:00:00    14.79    78.5   1.147
06:10:00    14.74    78.9   1.045
06:20:00    14.9     78.9   1.009
06:30:00    15.14    78.6   1.076
06:40:00    15.32    77.8   1.332
06:50:00    15.6     76.5   1.216
')
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
1

With base R, here are provided with two approaches that may help you to make it,

  • using findInterval():
df <- `row.names<-`(cbind(df1,df2[findInterval(df1$time, df2$time),-1]),rownames(df1))
  • using which.max():
df <- `row.names<-`(cbind(df1,
                          df2[sapply(df1$time, 
                                     function(x) which.max(df2$time >= x)-1),-1]),rownames(df1)) 

which gives

> df
    x   y   z                time Temp.ar Ur.ar Vel.Vento
1 -52 -39 -35 2019-12-11 06:08:03   14.79  78.5     1.147
2 -47 -57 -36 2019-12-11 06:08:08   14.79  78.5     1.147
3 -39   2 -40 2019-12-11 06:08:13   14.79  78.5     1.147
4 -45 -23 -29 2019-12-11 06:10:20   14.74  78.9     1.045
5 -51 -11 -31 2019-12-11 06:10:29   14.74  78.9     1.045
6 -69 -28 -19 2019-12-11 06:20:34   14.90  78.9     1.009

DATA

df1 <- structure(list(x = c(-52L, -47L, -39L, -45L, -51L, -69L), y = c(-39L, 
-57L, 2L, -23L, -11L, -28L), z = c(-35L, -36L, -40L, -29L, -31L, 
-19L), time = structure(list(sec = c(3, 8, 13, 20, 29, 34), min = c(8L, 
8L, 8L, 10L, 10L, 20L), hour = c(6L, 6L, 6L, 6L, 6L, 6L), mday = c(11L, 
11L, 11L, 11L, 11L, 11L), mon = c(11L, 11L, 11L, 11L, 11L, 11L
), year = c(119L, 119L, 119L, 119L, 119L, 119L), wday = c(3L, 
3L, 3L, 3L, 3L, 3L), yday = c(344L, 344L, 344L, 344L, 344L, 344L
), isdst = c(0L, 0L, 0L, 0L, 0L, 0L), zone = c("CET", "CET", 
"CET", "CET", "CET", "CET"), gmtoff = c(NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_)), class = c("POSIXlt", 
"POSIXt"))), row.names = c(NA, -6L), class = "data.frame")

df2 <- structure(list(time = structure(list(sec = c(0, 0, 0, 0, 0, 0
), min = c(0L, 10L, 20L, 30L, 40L, 50L), hour = c(6L, 6L, 6L, 
6L, 6L, 6L), mday = c(11L, 11L, 11L, 11L, 11L, 11L), mon = c(11L, 
11L, 11L, 11L, 11L, 11L), year = c(119L, 119L, 119L, 119L, 119L, 
119L), wday = c(3L, 3L, 3L, 3L, 3L, 3L), yday = c(344L, 344L, 
344L, 344L, 344L, 344L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L), zone = c("CET", 
"CET", "CET", "CET", "CET", "CET"), gmtoff = c(NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_)), class = c("POSIXlt", 
"POSIXt")), Temp.ar = c(14.79, 14.74, 14.9, 15.14, 15.32, 15.6
), Ur.ar = c(78.5, 78.9, 78.9, 78.6, 77.8, 76.5), Vel.Vento = c(1.147, 
1.045, 1.009, 1.076, 1.332, 1.216)), row.names = c(NA, -6L), class = "data.frame")
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81