0

I am using the dplyr::coalesce and dplyr::mutate to find all first non-missing values and stuff that into a new variable. However, I would like to also create a new variable with the information on which variable is used to infill the new variable.

Here is an example:

df <- dataframe(
      St1 = c(1, NA, NA, NA),
      St2 = c(NA, 3, NA, NA), 
      St3 = c(NA, NA, 12, NA),       
      St4 = c(NA, NA, NA, 4))

What I do : 

df <- df %>%
  mutate(df.coalesce = coalesce(St1, St2, St3, St4)) %>%
  select(df.coalesce)

Result: 

df.coalesce
1
3
12
4

Desired result: 

Station df.coalesce
St.1    1
St.2    3
St.3    12
St.4    4   

Is there a way to do that using the tidyverse grammar?

Thanks!

2 Answers2

0

You can use max.col to get column name with non-NA value in each row and use do.call with coalesce to apply it to all the columns.

library(dplyr)

df %>%
  transmute(Station = names(df)[max.col(replace(., is.na(.), 0))], 
            df.coalesce = do.call(coalesce, .))

#  Station df.coalesce
#1     St1           1
#2     St2           3
#3     St3          12
#4     St4           4
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

You can find all the ids having NA and then remove them.

train <- read.csv (file = "file", sep = ",", na.strings=c("NA"))
id_na_Cols <- sapply(train,function(x)any(is.na(x)))
trainData <- train[,!(id_na_Cols)]
write.table (trainData, file = "file_new", sep = ",")

Afterwards you can load new data for further analysis.