0

I'm new to R so forgive me if the answer is obvious. I've also tried to search for an answer, but I think I'm not using the right terms.

I have two dataframes, each consisting of a datetime and a value

e.g. Data Frame 1:

2003-01-01 10:00:00 | 10
2003-01-02 10:00:00 | 5
2003-01-03 10:00:00 | 7
 ...<snip>...
2003-06-15 10:00:00 | 4.5
2003-06-16 10:00:00 | 4.5
2003-06-17 10:00:00 | 3.5
 ...<snip>...
2003-11-21 10:00:00 | 3.5
2003-11-22 10:00:00 | 4
2003-11-23 10:00:00 | 4.5

and Data Frame 2:

2003-01-01 09:00:00 | 2
2003-03-19 12:00:00 | 5
2003-05-14 14:00:00 | 3.5
2003-06-10 14:00:00 | 4
 ...<snip>...
2003-10-20 14:00:00 | 2
2003-11-22 14:00:00 | 3

What I would like to do is add the values together if the timestamp in the first dataframe is within the timestamps of the second

e.g.

2003-01-01 10:00:00 is between 2003-01-01 09:00:00 and 2003-03-19 12:00:00, so calculation to be performed is 10 + 2.

[Removed inconsistent statement]

I'm assuming there is an easy way of doing this in R. As a programmer, my first instinct is to just use for-loops.

EDIT: What I'd like is something like the below

    timestamp          | measurement | correction | corrected
   2003-01-01 10:00:00 | 10          | 2          | 12   
   2003-01-02 10:00:00 | 5           | 2          | 7
   2003-01-03 10:00:00 | 7           | 2          | 9
         ...<snip>...
   2003-06-15 10:00:00 | 4.5         | 4          | 8.5
   2003-06-16 10:00:00 | 4.5         | 4          | 8.5
   2003-06-17 10:00:00 | 3.5         | 4          | 7.5
         ...<snip>...
   2003-11-21 10:00:00 | 3.5         | 2          | 5.5
   2003-11-22 10:00:00 | 4           | 2          | 6
   2003-11-23 10:00:00 | 4.5         | 3          | 7.5

Really the important thing is getting the corrected value. I've got it (sort of) working in multiple for-loops, but I want to be able to do it the "R" way.

 Time from DF1            Time from DF2      Calculation 
2003-11-21 10:00:00 >= 2003-10-20 14:00:00 = 3.5 + 2
2003-11-22 10:00:00 >= 2003-10-20 14:00:00 = 4   + 2
2003-11-23 10:00:00 >= 2003-11-22 14:00:00 = 4.5 + 3

EDIT 2:

I got it working with a loop. Is there a better way of doing it?

library(plyr)
library(lubridate)

df_measurements <- read.table(text = "
2003-01-01 10:00:00 | 10
2003-01-02 10:00:00 | 5
2003-01-03 10:00:00 | 7
2003-06-15 10:00:00 | 4.5
2003-06-16 10:00:00 | 4.5
2003-06-17 10:00:00 | 3.5
2003-11-21 10:00:00 | 3.5
2003-11-22 10:00:00 | 4
2003-11-23 10:00:00 | 4.5", sep = "|")

df_corrections <- read.table(text = "
2003-01-01 09:00:00 | 5.5
2003-05-01 09:00:00 | 6
2003-08-01 09:00:00 | 8", sep = "|")

#Create named columns and remove unneeded
df_measurements$time <- ymd_hms(df_measurements$V1)
df_measurements$obs <- df_measurements$V2
df_measurements$V1 <- NULL
df_measurements$V2 <- NULL

df_corrections$time <- ymd_hms(df_corrections$V1)
df_corrections$offset <- df_corrections$V2
df_corrections$V1 <- NULL
df_corrections$V2 <- NULL

#Get number of corrections
c_length <- nrow(df_corrections)

#Create blank data frame to merge results into
result <- data.frame(time=as.Date(character()), obs=numeric(), correction=numeric(), corrected=numeric(), stringsAsFactors=FALSE )

for(i in c(1:c_length)) {

  if(i < c_length) {

    subset_m <- df_measurements[df_measurements$time >= df_corrections$time[[i]] & df_measurements$time < df_corrections$time[[i+1]], ]
  } else {

    #Last correction in correction data frame
    subset_m <- df_measurements[df_measurements$time >= df_corrections$time[[i]], ]
  }

  #Make "correction" column and fill with correction to be used
  subset_m[, "correction"] <- rep(df_corrections$offset[[i]], nrow(subset_m)) 

  #Make "corrected" column and fill with corrected value
  subset_m$corrected <- subset_m$correction + subset_m$obs  

  #Combine subset with result
  result <- rbind(result, subset_m)

}

print(result)
ConnectedSystems
  • 892
  • 10
  • 19
  • I am still unclear about what you want. where did `2003-05-01 09:00:00` come from? – RJ- Aug 27 '13 at 02:15
  • "the last three records are after the last datetime in DataFrame 2". In which way does 2003-11-21 10:00:00 and 2003-11-22 10:00:00 come after 2003-11-22 14:00:00? On the other hand, 2003-11-23 10:00:00 does come after 2003-11-22 14:00:00. But now the '+5 rule' does not apply and instead 3 is added. Please clarify your question and make sure your examples are consistent. – Henrik Aug 27 '13 at 03:13
  • @Henrik I think you're looking at it the wrong way round. 2003-10-20 14:00:00 – ConnectedSystems Aug 27 '13 at 07:14
  • @Henrik Accidentally hit enter, and then took longer than 5 mins to edit my comment. I'll make changes to my question instead – ConnectedSystems Aug 27 '13 at 07:20
  • @RJ, I made some changes to clarify the question. – ConnectedSystems Aug 28 '13 at 01:02

1 Answers1

0

NB: this answer refers to the original question, which has been edited after I posted a working answer

Is this what you want?

df <- read.table(text = "2003-01-01 10:00:00 | 10
2003-01-02 10:00:00 | 5
2003-01-03 10:00:00 | 7
2003-06-15 10:00:00 | 4.5
2003-06-16 10:00:00 | 4.5
2003-06-17 10:00:00 | 3.5", sep = "|")
df$time <- as.POSIXct(df$V1)

df2 <- read.table(text = "2003-01-01 09:00:00 | 2
2003-05-01 09:00:00 | 5", sep = "|")
df2$time <- as.POSIXct(df2$V1)

df$val <- with(df, ifelse(df$time >= df2$time[1] & df$time <= df2$time[2], df$V2 + 2, df$V2 + 5))
Henrik
  • 65,555
  • 14
  • 143
  • 159
  • The contents of the dataframes were just an example. Yes, your answer would work, but only if I wanted to to compare DataFrame 1 with two timestamps. I'll edit my question to make what I want clearer – ConnectedSystems Aug 27 '13 at 00:55
  • I'm sorry you feel that way, and I really do appreciate the time people have spent helping me. That said, your answer was only technically correct, in that it would only work with when comparing against two variables. Plus, the only change to the question I made was to increase the sample size of the example data to clarify the question. Are you against people clarifying the question they are asking? – ConnectedSystems Aug 29 '13 at 23:59