1

How can you select first non NA value per column of each group using dplyr?

I have a tibble with NAs in different rows per column and group:

tibble(
  group = c(rep(1:3, each = 2), 4),
  x1 = c(1, NA, NA, 4, 5, NA, 7),
  x2 = c(NA, 100, 30, NA, 3, NA, NA)
)

# A tibble: 7 × 3
  group    x1    x2
  <dbl> <dbl> <dbl>
1     1     1    NA
2     1    NA   100
3     2    NA    30
4     2     4    NA
5     3     5     3
6     3    NA    NA
7     4     7    NA

And I want to combine all the first rows per each column and group which is not NA, so that the output looks like:

# A tibble: 4 × 3
  group    x1    x2
  <int> <dbl> <dbl>
1     1     1   100
2     2     4    30
3     3     5     3
4     4     7    NA
moremo
  • 315
  • 2
  • 11

2 Answers2

1

Using summarise and first you could do:

library(dplyr, warn=FALSE)

dat |>
  group_by(group) |>
  summarise(across(c(x1, x2), ~ first(.x[!is.na(.x)])))
#> # A tibble: 4 × 3
#>   group    x1    x2
#>   <dbl> <dbl> <dbl>
#> 1     1     1   100
#> 2     2     4    30
#> 3     3     5     3
#> 4     4     7    NA
stefan
  • 90,330
  • 6
  • 25
  • 51
1

You could also try dplyr::fill and dplyr::slice:

df %>%
  group_by(group) %>%
  fill(x1:x2, .direction = "downup") %>%
  slice(1)

Output

  group    x1    x2
  <dbl> <dbl> <dbl>
1     1     1   100
2     2     4    30
3     3     5     3
4     4     7    NA
jpsmith
  • 11,023
  • 5
  • 15
  • 36