Replace NA in row with value in adjacent row "ROW" not column

Question

Raw data:

    V1 V2
1   c1  a
2   c2  b
3 <NA>  c
4 <NA>  d
5   c3  e
6 <NA>  f
7   c4  g

Reproducible Sample Data

V1 = c('c1','c2',NA,NA,'c3',NA,'c4')
V2 = c('a','b','c','d','e','f','g')

data.frame(V1,V2)

Expected output

  V1_after V2_after
1       c1        a
2       c2    b c d
3       c3      e f
4       c4        g

V1_after <- c('c1','c2','c3','c4')
V2_after <- c('a',paste('b','c','d'),paste('e','f'),'g')

data.frame(V1_after,V2_after)

This is sample data. In Real data, Rows where NA in V1 is not regular

It is too difficult to me

Anoushiravan R · Accepted Answer · 2021-08-06T18:12:36.777

3

You could make use of zoo::na.locf for this. It takes the most recent non-NA value and fill all NA values on the way:

library(dplyr)
library(zoo)

df %>%
  mutate(V1 = zoo::na.locf(V1)) %>%
  group_by(V1) %>%
  summarise(V2 = paste0(V2, collapse = " "))

# A tibble: 4 x 2
  V1    V2   
  <chr> <chr>
1 c1    a    
2 c2    b c d
3 c3    e f  
4 c4    g

edited Aug 06 '21 at 18:12

answered Aug 06 '21 at 06:45

Anoushiravan R

21,622
3
18
41

1

I was thinking if there is a function like `nafill` in `data.table`, then see your `na.locf` function, cheers! – ThomasIsCoding Aug 06 '21 at 07:04
1

This is a very good and efficient function and the only one I know from this package lol. But `zoo` is normally used for rolling computations the same way as `slider` and `runner`. – Anoushiravan R Aug 06 '21 at 07:24
1

Use na.locf0 with a zero on the end in which case it will ensure that the result is the same length as the input. – G. Grothendieck Aug 06 '21 at 13:28
Thank you very much Mr. @G.Grothendieck I think this one is even more useful. I also checked another question for which you explained about it to understand it. – Anoushiravan R Aug 06 '21 at 16:53

score 3 · Answer 2 · answered Aug 06 '21 at 06:53

3

A base R option using na.omit + cumsum + aggregate

aggregate(
  V2 ~ .,
  transform(
    df,
    V1 = na.omit(V1)[cumsum(!is.na(V1))]
  ), c
)

gives

  V1      V2
1 c1       a
2 c2 b, c, d
3 c3    e, f
4 c4       g

answered Aug 06 '21 at 06:53

ThomasIsCoding

96,636
9
24
81

score 2 · Answer 3 · answered Aug 06 '21 at 06:43

2

You can fill the NA with the previous non-NA values and summarise the data.

library(dplyr)
library(tidyr)

df %>%
  fill(V1) %>%
  group_by(V1) %>%
  summarise(V2 = paste(V2, collapse = ' '))

#   V1    V2   
#  <chr> <chr>
#1 c1    a    
#2 c2    b c d
#3 c3    e f  
#4 c4    g

answered Aug 06 '21 at 06:43

Ronak Shah

377,200
20
156
213

You beat me by 1 second :P I have to use base R instead, haha – ThomasIsCoding Aug 06 '21 at 06:54

Replace NA in row with value in adjacent row "ROW" not column

3 Answers3

Linked