1

I have medical registry data from 3 different sources and for many of my variables I have multiple entries from each registry. Each row contains data from only 1 registry (source). I have been able to coalesce the three together to make single 'new' variable, but I would also like to create a variable that says which source the coalesced variable originated from. I am new to using R in this way(normally I would scurry back to excel to manipulate variables) and I have spent some time looking for similar examples but cannot find an answer. Any help would be greatly appreciated. (First time poster, so suggestions on framing my question are also helpful).

    library(tidyverse)

    df <- tibble(var1 = c(1,2,NA,NA,NA), var2 = c(NA,NA,3,4,NA),var3 = c(NA,NA,NA,NA,5))
    df
    #># A tibble: 5 x 3
    #>    var1  var2  var3
    #>   <dbl> <dbl> <dbl>
    #>1     1    NA    NA
    #>2     2    NA    NA
    #>3    NA     3    NA
    #>4    NA     4    NA
    #>5    NA    NA     5

    #CoalesCe x, y and z to 'new' variable

    >df$new <- coalesce(df$var1,df$var2,df$var3)

    >df
    #># A tibble: 5 x 4
    #>     var1  var2  var3   new
    #>    <dbl> <dbl> <dbl> <dbl>
    #> 1     1    NA    NA     1
    #> 2     2    NA    NA     2
    #> 3    NA     3    NA     3
    #> 4    NA     4    NA     4
    #> 5    NA    NA     5     5

    #I would also like a variable that gives the 'source' of the coalesced variable, that         
    would look like below, but I cannot figure out how to do this 
    >df_final
    #># A tibble: 5 x 5
    #>   var1  var2  var3   new source
    #>   <dbl> <dbl> <dbl> <dbl> <chr> 
    #>1     1    NA    NA     1 var1  
    #>2     2    NA    NA     2 var1  
    #>3    NA     3    NA     3 var2  
    #>4    NA     4    NA     4 var2  
    #>5    NA    NA     5     5 var3 
RbFoz
  • 23
  • 4

2 Answers2

0

One option:

df$source <- 
  do.call(
    coalesce,
    lapply(seq_len(ncol(df)), function(i) ifelse(is.na(df[[i]]), NA, names(df)[[i]]))    
  )
# [1] "var1" "var1" "var2" "var2" "var3"

A second option (requires data.table)

names(df)[sapply(data.table::transpose(df), function(x) match(FALSE, is.na(x)))]
# [1] "var1" "var1" "var2" "var2" "var3"

A third pure base R solution:

names(df)[apply(df, 1, function(x) match(FALSE, is.na(x)))]
# [1] "var1" "var1" "var2" "var2" "var3"
s_baldur
  • 29,441
  • 4
  • 36
  • 69
0

Using rowwise:

tibble(var1 = c(1,2,NA,NA,NA), var2 = c(NA,NA,3,4,NA),var3 = c(NA,NA,NA,NA,5)) %>%
  rowwise() %>%
  mutate(source = names(.)[which(!is.na(c_across(var1:var3)))])

   var1  var2  var3 source
  <dbl> <dbl> <dbl> <chr> 
1     1    NA    NA var1  
2     2    NA    NA var1  
3    NA     3    NA var2  
4    NA     4    NA var2  
5    NA    NA     5 var3
det
  • 5,013
  • 1
  • 8
  • 16