R - Collapsing rows to closest date when NA

Question

Let's suppose I have this dataframe:

Date        A   B
2010-01-01  NA  1
2010-01-02  2   NA
2010-01-05  3   NA
2010-01-07  NA  4
2010-01-20  5   NA
2010-01-25  6   7

I want to collapse rows, removing the NA values to the closest Date. So the result would be:

Date        A  B
2010-01-02  2  1
2010-01-07  3  4
2010-01-20  5  NA
2010-01-25  6  7

I saw this stack overflow that solves collapsing using a key value, but I could not find a similar case using close date values to collapse.

Obs1: It would be good if there was a way to not collapse the rows if the dates are too far apart (example: more than 15 days apart).

Obs2: It would be good if the collapsing lines kept the latter date rather than the earlier, as shown in the example above.

Only collapse where there is NA (so in the example above the answer would be no). If there were more columns it could be the case to collapse more than 2 rows. — Daniel, May 11 '18 at 20:21
Yea. I was thinking on similar line and in solution I tried to collapse until both `A` and `B` got `non-NA` values. — MKR, May 11 '18 at 20:23

MKR · Answer 1 · 2018-05-11T20:31:15.420

0

Using dplyr package an option could be to group_by on combination of A and B in such a way that they form complete values.

Considering Obs#2 the max of Date should be taken for combined row.

library(dplyr)
library(lubridate)
df %>% mutate(Date = ymd(Date)) %>%
  mutate(GrpA = cumsum(!is.na(A)), GrpB = cumsum(!is.na(B))) %>%
  rowwise() %>%
  mutate(Grp = max(GrpA, GrpB)) %>%
  ungroup() %>%
  select(-GrpA, -GrpB) %>%
  group_by(Grp) %>%
  summarise(Date = max(Date), A = A[!is.na(A)][1], B = B[!is.na(B)][1])

# # A tibble: 4 x 4
# Grp Date           A     B
# <int> <date>     <int> <int>
# 1     1 2010-01-02     2     1
# 2     2 2010-01-07     3     4
# 3     3 2010-01-20     5    NA
# 4     4 2010-01-25     6     7

Data:

df <- read.table(text = 
"Date        A   B
2010-01-01  NA  1
2010-01-02  2   NA
2010-01-05  3   NA
2010-01-07  NA  4
2010-01-20  5   NA
2010-01-25  6   7",
stringsAsFactors = FALSE, header = TRUE)

edited May 11 '18 at 20:31

answered May 11 '18 at 20:20

MKR

19,739
4
23
33

This does not work. For example, using the following df, the line where B = 5 disappears: df <- read.table(text = "Date A B 2010-01-01 NA 1 2010-01-02 2 NA 2010-01-05 3 NA 2010-01-06 8 NA 2010-01-07 NA 4 2010-01-08 NA 5 2010-01-20 5 NA", stringsAsFactors = FALSE, header = TRUE) – Daniel May 11 '18 at 20:35
@Daniel That's true. Its not considering that case. Let me re-visit it. – MKR May 11 '18 at 20:43

R - Collapsing rows to closest date when NA

1 Answers1