0

I have two long time series to compare, however, the sampling of them is completely different. The first one is with hourly, the second one is with irregular sampling. I would like to compare Value1 and Value2, so, I would like to select Value1 records from df1 at 02:00 according to df2 dates. How can I solve it in R?

df1:

Date1 Value1
2014-01-01 01:00:00 0.16
2014-01-01 02:00:00 0.13
2014-01-01 03:00:00 0.6
2014-01-02 01:00:00 0.5
2014-01-02 02:00:00 0.22
2014-01-02 03:00:00 0.17
2014-01-19 01:00:00 0.2
2014-01-19 02:00:00 0.11
2014-01-19 03:00:00 0.15
2014-01-21 01:00:00 0.13
2014-01-21 02:00:00 0.33
2014-01-21 03:00:00 0.1
2014-01-23 01:00:00 0.09
2014-01-23 02:00:00 0.02
2014-01-23 03:00:00 0.16

df2:

Date2 Value2
2014-01-01 13
2014-01-19 76
2014-01-23 8

desired output: df_fused:

Date1 Value1 Value2
2014-01-01 02:00:00 0.13 13
2014-01-19 02:00:00 0.11 76
2014-01-23 02:00:00 0.02 8
  • I do not get the connection between your desired output and the sample data provided. Can you explain how to derive thepoutput from df1 and df2? What logic is used? – Wimpel Mar 23 '21 at 12:50
  • I extended the dataset, I hop that is more clear now – Kovács István Péter Mar 23 '21 at 13:39
  • could you please provide the same of your data in a format that is easily reproducible? (see https://stackoverflow.com/questions/5963269) This will increase the likelihood and the speed you will get answers. – Marcelo Avila Mar 23 '21 at 22:56

1 Answers1

1

here is a data.table approach

library( data.table )

#sample data can also be setDT(df1);setDT(df2)
df1 <- fread("Date1     Value1
2014-01-01 01:00:00     0.16
2014-01-01 02:00:00     0.13
2014-01-01 03:00:00     0.6
2014-01-02 01:00:00     0.5
2014-01-02 02:00:00     0.22
2014-01-02 03:00:00     0.17
2014-01-19 01:00:00     0.2
2014-01-19 02:00:00     0.11
2014-01-19 03:00:00     0.15
2014-01-21 01:00:00     0.13
2014-01-21 02:00:00     0.33
2014-01-21 03:00:00     0.1
2014-01-23 01:00:00     0.09
2014-01-23 02:00:00     0.02
2014-01-23 03:00:00     0.16")

df2 <- fread("Date2     Value2
2014-01-01  13
2014-01-19  76
2014-01-23  8")

#set dates to posix
df1[, Date1 := as.POSIXct( Date1, format = "%Y-%m-%d %H:%M:%S", tz = "UTC" )]
#set df2 dates to 02:00:00 time
df2[, Date2 := as.POSIXct( paste0( Date2, "02:00:00" ), format = "%Y-%m-%d %H:%M:%S", tz = "UTC" )]
#join
df2[ df1, Value1 := i.Value1, on = .(Date2 = Date1)][]

#         Date2 Value2 Value1
# 1: 2014-01-01 02:00:00     13   0.13
# 2: 2014-01-19 02:00:00     76   0.11
# 3: 2014-01-23 02:00:00      8   0.02
Wimpel
  • 26,031
  • 1
  • 20
  • 37