2

I want to convert NAs to a certain value, based on the given variable value for that specific ID. Sample query: df1 ---> df2

df1 = data.frame(ID=c(1,1, 1, 1, 2,2,2,2,3,3,3,3),WHR=c(0.8,NA, NA, NA,1.0, NA, NA,NA,1.1, NA, NA, NA))

df2=data.frame(ID=c(1,1, 1, 1, 2,2,2,2,3,3,3,3),WHR=c(0.8,0.8, 0.8, 0.8,1.0, 1.0,1.0,1.0, 1.1, 1.1,1.1,1.1))`

What I tried

R fill in NA with previous row value with condition:

library(xts)
df1[,WHR:=na.locf("WHR", fromlast=TRUE, by = ID)` 

Got error:

could not find function ":="; 

I used this code because I have hundreds of ID values and I would like an automatic code that changes NAs in a particular column based on ID.

How can I convert df1 to df2? (pls explain your code as well, so it may help other beginner users). Thank you!

Community
  • 1
  • 1
Aby
  • 167
  • 1
  • 3
  • 16
  • You need to install and load the `data.table` package and convert your data to a `data.table` object (for example using `setDT(df1)`) for the code to work – talat Mar 18 '16 at 12:43
  • 2
    Possible duplicate of [Replacing NAs with latest non-NA value](http://stackoverflow.com/questions/7735647/replacing-nas-with-latest-non-na-value) – Ronak Shah Mar 18 '16 at 12:44
  • sorry, did not work @mtoto – Aby Mar 18 '16 at 12:47
  • @docendodiscimus Loaded `data.table` and ran `df1[,WHR:=na.locf(WHR, fromlast=TRUE), by = ID]`, now getting another error: unused argument (by = ID) – Aby Mar 18 '16 at 12:48
  • Did you run `setDT(df1)`? – talat Mar 18 '16 at 12:57
  • Thanks, @docendo! Query has been solved by this code: `df2$WHR.Comp = ave(df1$WHR, cumsum(!is.na(df1$WHR)), FUN=function(x) x[1])` – Aby Mar 18 '16 at 13:02
  • This code: `df2$WHR.Comp = ave(df1$WHR, cumsum(!is.na(df1$WHR)), FUN=function(x) x[1])` simply returns the values for ID 1 -3, again and again, for IDs 4 to 100.. not sure what is happening? – Aby Mar 18 '16 at 13:45

3 Answers3

2

Using base R you can broadcast the first non-NA using the cumsum of their locations and the ave function:

df2$WHR.Comp = ave(df1$WHR, cumsum(!is.na(df1$WHR)), FUN=function(x) x[1])

> df2
   ID WHR WHR.Comp
1   1 0.8      0.8
2   1 0.8      0.8
3   1 0.8      0.8
4   1 0.8      0.8
5   2 1.0      1.0
6   2 1.0      1.0
7   2 1.0      1.0
8   2 1.0      1.0
9   3 1.1      1.1
10  3 1.1      1.1
11  3 1.1      1.1
12  3 1.1      1.1
Zelazny7
  • 39,946
  • 18
  • 70
  • 84
  • @Zelazyn7 - I noticed this code works for first three IDs only, it replicates 0.8, 1.0, 1.1 for further IDs, which it should not. Why is it so for the next IDs? – Aby Mar 18 '16 at 13:41
  • are you referencing the correct data.frames? If it is repeating those values, my guess is that you are still referencing the test data.frame and the results are being recycled. – Zelazny7 Mar 18 '16 at 13:50
  • there was an issue with column name, e.g., `df2$WHR.Comp` vs `df2$`WHR.Comp``, if the column name had single inverted commas, then code did not work – Aby Mar 19 '16 at 08:05
1

You could construct a mapping for the missing values:

idx <- !is.na(df1[,"WHR"])
map <- setNames(df1[idx,"WHR"], df1[idx,"ID"])

and then apply this map to the NA values

df2[!idx, "WHR2"] <- map[df2[!idx, "ID"]]
Karsten W.
  • 17,826
  • 11
  • 69
  • 103
  • Thanks, Karsten! @Zelazyn7 solved the query. Your code changes the values of WHR (to 0.8 1.0 1.0 1.0 1.0 2.0 2.0 2.0 1.1 3.0 3.0 3.0). Never mind, but wonder why? – Aby Mar 18 '16 at 12:59
  • Sorry, wrong argument order to `setNames`. Fixed it. Thanks for trying it out! – Karsten W. Mar 18 '16 at 13:11
  • No issues. This code: `df2$WHR.Comp = ave(df1$WHR, cumsum(!is.na(df1$WHR)), FUN=function(x) x[1])` worked but it simply returns the values of ID 1 -3, again and again, for IDs 4 to 100.. not sure what is happening? – Aby Mar 18 '16 at 13:46
0

mean by ID:

for (i in unique(df1$ID)) df1[df1$ID==i & is.na(df1[,'WHR']),'WHR'] <- mean(df2[df2$ID==i,'WHR'])
RustamA
  • 91
  • 3