2

Raw data:

    V1 V2
1   c1  a
2   c2  b
3 <NA>  c
4 <NA>  d
5   c3  e
6 <NA>  f
7   c4  g

Reproducible Sample Data

V1 = c('c1','c2',NA,NA,'c3',NA,'c4')
V2 = c('a','b','c','d','e','f','g')

data.frame(V1,V2)

Expected output

  V1_after V2_after
1       c1        a
2       c2    b c d
3       c3      e f
4       c4        g
V1_after <- c('c1','c2','c3','c4')
V2_after <- c('a',paste('b','c','d'),paste('e','f'),'g')

data.frame(V1_after,V2_after)

This is sample data. In Real data, Rows where NA in V1 is not regular

It is too difficult to me

younghyun
  • 341
  • 1
  • 8

3 Answers3

3

You could make use of zoo::na.locf for this. It takes the most recent non-NA value and fill all NA values on the way:

library(dplyr)
library(zoo)

df %>%
  mutate(V1 = zoo::na.locf(V1)) %>%
  group_by(V1) %>%
  summarise(V2 = paste0(V2, collapse = " "))

# A tibble: 4 x 2
  V1    V2   
  <chr> <chr>
1 c1    a    
2 c2    b c d
3 c3    e f  
4 c4    g 
Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41
  • 1
    I was thinking if there is a function like `nafill` in `data.table`, then see your `na.locf` function, cheers! – ThomasIsCoding Aug 06 '21 at 07:04
  • 1
    This is a very good and efficient function and the only one I know from this package lol. But `zoo` is normally used for rolling computations the same way as `slider` and `runner`. – Anoushiravan R Aug 06 '21 at 07:24
  • 1
    Use na.locf0 with a zero on the end in which case it will ensure that the result is the same length as the input. – G. Grothendieck Aug 06 '21 at 13:28
  • Thank you very much Mr. @G.Grothendieck I think this one is even more useful. I also checked another question for which you explained about it to understand it. – Anoushiravan R Aug 06 '21 at 16:53
3

A base R option using na.omit + cumsum + aggregate

aggregate(
  V2 ~ .,
  transform(
    df,
    V1 = na.omit(V1)[cumsum(!is.na(V1))]
  ), c
)

gives

  V1      V2
1 c1       a
2 c2 b, c, d
3 c3    e, f
4 c4       g
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
2

You can fill the NA with the previous non-NA values and summarise the data.

library(dplyr)
library(tidyr)

df %>%
  fill(V1) %>%
  group_by(V1) %>%
  summarise(V2 = paste(V2, collapse = ' '))

#   V1    V2   
#  <chr> <chr>
#1 c1    a    
#2 c2    b c d
#3 c3    e f  
#4 c4    g    
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213