I have two columns: Time, and Value. Time is continuous and does not have any blanks. Value, however, comprise data that had been sampled at random points and thus have random lengths of data gaps in between values.
Here's a very simple sample dataset:
df <-data.frame(Time=1:10, Value=c("2", NA, NA, NA, "6", NA, NA, "7", NA, "3"))
I would like to create a third column, "Estimate". Under this new column:
- If Value is not blank, then Estimate = Value.
- If Value is blank, then: (this is the part where I am having trouble!)
- Under Value, find the last non-NA value, find the next non-NA value, and find the difference
- Take the difference and divide it by the number of empty rows from the first value to and including the next value
- Using that number, fill in the Estimate column incrementally from that last non-NA value to the next non-NA value
For example, for Time 2
- First value = 2 (from Time 1)
- Next value = 6 (from Time 5)
- Difference = 4
- Increment = 4 / 4 (because there are 3 blank spaces + Time 5 with the recorded value)
- Therefore, Estimate at Time 2 would be first value + increment = 2 + 1 = 3
- Estimate at Time 3 would then be 4, and Time 4 would have an Estimate of 5. Similarly, Time 6 would be 6.33, Time 7 would be 6.66, and then Time 8 has an actual value of 7.
Essentially I'm just making an equally-weighted transition from the first value to the next value. I'm not concerned with anything before the first value or last value (if there were NAs before Time 1 or after Time 10).
THE QUESTION:
Being very much a newbie, I'm not quite sure how to best approach coding for the Estimate column, when Value is blank. I've tried generating a vector of row numbers for rows with actual values, thinking I could use that as index reference. I then tried to do a loop where it would take row A and row B (from the vector of row numbers), calculate the increment, then add the increment to the last cell. However, I couldn't figure out how to make both A & B increase by 1 at the same time (such that it did a "rolling window" down my vector of row numbers). I also suspect this is not a good way of approaching this problem...but don't know what my options are.
Any guidance and pointing in the right direction would be greatly appreciated!