Replacing NA with duplicates from adjacent row R

Question

I have a dataset that contains features Date, Age and Customer_ID. Some of the rows in Age have missing values (NAs) in them and I would like to impute them.

Here is some example data:

Date <- c("201101", "201102", "201101", "201102", "201103")
Age <- c("12-17", "12-17", "30-35", NA, NA)
Customer_ID <- c("1234", "1234", "5678", "5678", "5678")
df <- data.frame(Date, Age, Customer_ID)

Date      Age      Customer_ID
201101    12-17    1234
201102    12-17    1234
201101    30-35    5678
201102    NA       5678
201103    NA       5678

I would like to replace the NAs in Age with 30-35.

So for all NAs, it has to check whether there is another row with the same Customer_ID and replace the NA with the value for Age stated in the other row.

Any ideas on how to do this? Thanks.

@SamFirke You are right, thanks for the link. – kiae Sep 16 '16 at 17:52 — kiae, Sep 16 '16 at 17:52

Psidom · Accepted Answer · 2016-09-16T17:36:15.687

You can use the fill function from tidyr. It is a tidyr function for last observation carried forward, i.e, fill NA values with previous non-NA value. In order for this to work, you can use arrange to sort the column2 which sorts the NA values behind non-NAs and then you can group by customer ID and fill the Age column:

library(dplyr)
library(tidyr)
df %>% arrange(Age) %>% group_by(Customer_ID) %>% fill(Age)

# Source: local data frame [5 x 3]
# Groups: Customer_ID [2]

#      Date    Age Customer_ID
#    <fctr>  <fctr>  <fctr>
# 1  201101   12-17    1234
# 2  201102   12-17    1234
# 3  201101   30-35    5678
# 4  201102   30-35    5678
# 5  201103   30-35    5678

score 1 · Answer 2 · answered Sep 16 '16 at 17:54

1

With base R:

lookup <- unique(df[!is.na(df$Age),][c('Customer_ID', 'Age')])
df[is.na(df$Age),]$Age <- lookup[match(df[is.na(df$Age),]$Customer_ID,
                                                lookup$Customer_ID),]$Age

answered Sep 16 '16 at 17:54

Sandipan Dey

21,482
2
51
63

Replacing NA with duplicates from adjacent row R

2 Answers2