Replace NA with previous value for certain column in R

Question

I want to convert NAs to a certain value, based on the given variable value for that specific ID. Sample query: df1 ---> df2

df1 = data.frame(ID=c(1,1, 1, 1, 2,2,2,2,3,3,3,3),WHR=c(0.8,NA, NA, NA,1.0, NA, NA,NA,1.1, NA, NA, NA))

df2=data.frame(ID=c(1,1, 1, 1, 2,2,2,2,3,3,3,3),WHR=c(0.8,0.8, 0.8, 0.8,1.0, 1.0,1.0,1.0, 1.1, 1.1,1.1,1.1))`

What I tried

R fill in NA with previous row value with condition:

library(xts)
df1[,WHR:=na.locf("WHR", fromlast=TRUE, by = ID)`

Got error:

could not find function ":=";

I used this code because I have hundreds of ID values and I would like an automatic code that changes NAs in a particular column based on ID.

How can I convert df1 to df2? (pls explain your code as well, so it may help other beginner users). Thank you!

You need to install and load the `data.table` package and convert your data to a `data.table` object (for example using `setDT(df1)`) for the code to work — talat, Mar 18 '16 at 12:43
Possible duplicate of [Replacing NAs with latest non-NA value](http://stackoverflow.com/questions/7735647/replacing-nas-with-latest-non-na-value) — Ronak Shah, Mar 18 '16 at 12:44
@docendodiscimus Loaded `data.table` and ran `df1[,WHR:=na.locf(WHR, fromlast=TRUE), by = ID]`, now getting another error: unused argument (by = ID) — Aby, Mar 18 '16 at 12:48
Thanks, @docendo! Query has been solved by this code: `df2$WHR.Comp = ave(df1$WHR, cumsum(!is.na(df1$WHR)), FUN=function(x) x[1])` — Aby, Mar 18 '16 at 13:02
This code: `df2$WHR.Comp = ave(df1$WHR, cumsum(!is.na(df1$WHR)), FUN=function(x) x[1])` simply returns the values for ID 1 -3, again and again, for IDs 4 to 100.. not sure what is happening? — Aby, Mar 18 '16 at 13:45

score 2 · Accepted Answer · answered Mar 18 '16 at 12:43

2

Using base R you can broadcast the first non-NA using the cumsum of their locations and the ave function:

df2$WHR.Comp = ave(df1$WHR, cumsum(!is.na(df1$WHR)), FUN=function(x) x[1])

> df2
   ID WHR WHR.Comp
1   1 0.8      0.8
2   1 0.8      0.8
3   1 0.8      0.8
4   1 0.8      0.8
5   2 1.0      1.0
6   2 1.0      1.0
7   2 1.0      1.0
8   2 1.0      1.0
9   3 1.1      1.1
10  3 1.1      1.1
11  3 1.1      1.1
12  3 1.1      1.1

answered Mar 18 '16 at 12:43

Zelazny7

39,946
18
70
84

@Zelazyn7 - I noticed this code works for first three IDs only, it replicates 0.8, 1.0, 1.1 for further IDs, which it should not. Why is it so for the next IDs? – Aby Mar 18 '16 at 13:41
are you referencing the correct data.frames? If it is repeating those values, my guess is that you are still referencing the test data.frame and the results are being recycled. – Zelazny7 Mar 18 '16 at 13:50
there was an issue with column name, e.g., `df2$WHR.Comp` vs `df2$`WHR.Comp``, if the column name had single inverted commas, then code did not work – Aby Mar 19 '16 at 08:05

Karsten W. · Answer 2 · 2016-03-18T13:10:33.550

1

You could construct a mapping for the missing values:

idx <- !is.na(df1[,"WHR"])
map <- setNames(df1[idx,"WHR"], df1[idx,"ID"])

and then apply this map to the NA values

df2[!idx, "WHR2"] <- map[df2[!idx, "ID"]]

edited Mar 18 '16 at 13:10

answered Mar 18 '16 at 12:55

Karsten W.

17,826
11
69
103

Thanks, Karsten! @Zelazyn7 solved the query. Your code changes the values of WHR (to 0.8 1.0 1.0 1.0 1.0 2.0 2.0 2.0 1.1 3.0 3.0 3.0). Never mind, but wonder why? – Aby Mar 18 '16 at 12:59
Sorry, wrong argument order to `setNames`. Fixed it. Thanks for trying it out! – Karsten W. Mar 18 '16 at 13:11
No issues. This code: `df2$WHR.Comp = ave(df1$WHR, cumsum(!is.na(df1$WHR)), FUN=function(x) x[1])` worked but it simply returns the values of ID 1 -3, again and again, for IDs 4 to 100.. not sure what is happening? – Aby Mar 18 '16 at 13:46

score 0 · Answer 3 · answered Mar 18 '16 at 15:37

0

mean by ID:

for (i in unique(df1$ID)) df1[df1$ID==i & is.na(df1[,'WHR']),'WHR'] <- mean(df2[df2$ID==i,'WHR'])

answered Mar 18 '16 at 15:37

RustamA

91
3

Replace NA with previous value for certain column in R

3 Answers3

Linked

Related