averaging specific rows of one column based on values of another

Question

I have a dataset of measurements using two techniques in succession, meaning half-hour with one technique and one hour with another (data below).

I would like to compare both techniques, so need a single point (every 1.5 hrs) from both datasets.

I want to obtain a column 5, where average of row 1 and 3 for column 4 at position row 2, meaning average of 16:00 and 17:00 value of column 4 at 16:30 timestamp so that I can compare column 2 and column 5 directly.

table with sample data

Welcome to SO. Please see [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for information on how to improve your question. Sharing data as pictures is not very helpful. — Axeman, Sep 05 '19 at 21:30
Duplicate of [Replace NA with previous and next rows mean in R](https://stackoverflow.com/questions/22916525/replace-na-with-previous-and-next-rows-mean-in-r) — M--, Sep 05 '19 at 21:40
This should be trivial to code, assuming your index is regular without gaps, then we don't even need to inspect timestamp values, we simply want to index row slices `0:2, 3:5, 6:8...`. Or `groupby(index // 3)`. Please post what code you've tried. — smci, Sep 05 '19 at 22:22
@M.: it's a trivially easy question but not a duplicate of that question: OP doesn't want to replace NAs in column 3 ("meastA"?), but create a new column "meastC" where the NAs in meastA are replaced. — smci, Sep 05 '19 at 22:26
a) Please post your data as text, not image. [\[MCVE\]](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) b) Please give your columns names, if only for the clarity of the question. If they don't have names then just pick meaningful names for them (*"average of row 1 and 3 for column 4 at position row 2"* is plain hard to make sense of). The index doesn't need a name. Suggested names: `timestamp, meastA, meastB, meastC` — smci, Sep 05 '19 at 22:27
(*"In each 90min window of timestamp, we have three rows: MeastA samples at +0 and +60 min, and Meast B sample at +30 min. I want to average(?)/(interpolate?) both MeastA values at the +30 min point, into a new column MeastC"*) — smci, Sep 05 '19 at 22:45
Thank you everyone for the comments and solution. Next time will try to post the data in correct format. Among the suggestions, replacing NA from user M works for my purpose. — hharry16, Sep 07 '19 at 16:49

G. Grothendieck · Answer 1 · 2019-09-05T21:40:34.537

Suppose we have DF shown in the Note. Then assuming that you want to replace NAs with a linear interpolation of the nearest non-NAs and simply extend non-NAs to fill in values at the beginning and end:

library(zoo)

replace(DF, 2:3, na.approx(DF[-1], rule = 2))

giving:

##   V1       V2  V3
## 1  A 1.000000 1.0
## 2  B 1.000000 1.5
## 3  C 1.333333 2.0
## 4  D 1.666667 3.0
## 5  E 2.000000 3.5
## 6  F 2.333333 4.0
## 7  G 2.666667 5.0
## 8  H 3.000000 5.0

If you want the average of the nearest NAs then use na.locf both forward and backward and then average them and finally fill in the extremities using na.fill.

library(zoo)

replace(DF, 2:3, na.fill(
                     (na.locf(DF[-1], na.rm = FALSE) + 
                      na.locf(DF[-1], fromLast = TRUE, na.rm = FALSE))/2, 
                    "extend"))

##   V1  V2  V3
## 1  A 1.0 1.0
## 2  B 1.0 1.5
## 3  C 1.5 2.0
## 4  D 1.5 3.0
## 5  E 2.0 3.5
## 6  F 2.5 4.0
## 7  G 2.5 5.0
## 8  H 3.0 5.0

Yet another approach is to use splines. See ?na.spline .

Note

DF <- data.frame(V1 = head(LETTERS, 8), V2 = c(NA, 1, NA, NA, 2, NA, NA, 3),
  V3 = c(1, NA, 2, 3, NA, 4, 5, NA))

Thank you for the answer, it works for my data. however, will try it with complete dataset — hharry16, Sep 07 '19 at 16:51

averaging specific rows of one column based on values of another

1 Answers1

Note