1

I have a data set similar to the following format:

enter image description here

Each user has only one variable that is not NA. I want to return the column name of this NA column as follows:

enter image description here

Writing a loop by row may easily solve this problem, but I want to user data.table to generate this variable.

Hongfei Li
  • 59
  • 6

1 Answers1

4

With base R, it would be more efficent

df1$NonNA_VarName <- names(df1)[-1][max.col(!is.na(df1[-1]), 'first')]
df1$NonNA_VarName
#[1] "v1" "v2" "v1" "v3" "v4" "v3"

With data.table, an option is to melt into 'long' format and then extract the 'variable

library(data.table)
melt(setDT(df1), id.var = 'user',  na.rm = TRUE)[,
       .(NonNA_VarName = first(variable)), user][df1, on = .(user)]

Or another option is to group by 'user' and use which.max to return the index

setDT(df1)[,  NonNA_VarName := names(.SD)[which.max(unlist(.SD))], by = user]

data

df1 <- structure(list(user = 1:6, v1 = c(3, NA, 2, NA, NA, NA), v2 = c(NA, 
5, NA, NA, NA, NA), v3 = c(NA, NA, NA, 5, NA, 7), v4 = c(NA, 
NA, NA, NA, 4, NA)), class = "data.frame", row.names = c(NA, 
-6L))
akrun
  • 874,273
  • 37
  • 540
  • 662