1

I apologize if I post a similar question to one I asked earlier, but I realized that my original question wasn't very clear.

I have a dataframe with five columns and 6 rows (actually they are many more, just trying to simplify matters):

One Two Three   Four    Five
Cat NA  NA  NA  NA
NA  Dog NA  NA  NA
NA  NA  NA  Mouse   NA
Cat NA  Rat NA  NA
Horse   NA  NA  NA  NA
NA NA NA NA NA

Now, I would like to coalesce all the information in a new single column ('Summary'), like this:

Summary
Cat
Dog
Mouse
Error
Horse
NA

Please note the 'Error' reported on the fourth Summary row, because two different values have been reported during the merging.Also please note that in case there are only NAs in a row, it should be reported 'NA' and not 'Error' I tried to look at the 'coalesce' function in the dplyr package, but it really desn't seem to do what I need. Thanks in advance.

Arturo
  • 342
  • 1
  • 4
  • 14
  • 1
    I left the same comment on your previous question, but seeing the code you've tried, even if it didn't work how you wanted, might be helpful for understanding your approach & logic – camille Jan 14 '20 at 21:15

3 Answers3

7

One base R option could be:

ifelse(rowSums(!is.na(df)) > 1, "Error", do.call(pmin, c(df, na.rm = TRUE))) 

[1] "Cat"   "Dog"   "Mouse" "Error" "Horse" NA 
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
2

We can use coalesce from dplyr. It is possible to have the NA only column to be of type logical and this could have a clash when we use coalesce. One option is to make changes in the class for that column and then coalesce would work

library(dplyr)
df1 %>%
   mutate_if(~ all(is.na(.)) && is.logical(.), ~ NA_character_) %>%
   transmute(Summary = case_when(rowSums(!is.na(.)) > 1 ~ "Error",
            TRUE ~ coalesce(!!! .)))
#  Summary
#1     Cat
#2     Dog
#3   Mouse
#4   Error
#5   Horse
#6    <NA>

Data

df1 <- structure(list(One = c("Cat", NA, NA, "Cat", "Horse", NA), Two = c(NA, 
"Dog", NA, NA, NA, NA), Three = c(NA, NA, NA, "Rat", NA, NA), 
    Four = c(NA, NA, "Mouse", NA, NA, NA), Five = c(NA, NA, NA, 
    NA, NA, NA)), class = "data.frame", row.names = c(NA, -6L
))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks, but it should report 'Error' on the fourth row – Arturo Jan 14 '20 at 21:17
  • @Arturo Earlier, I was thinking about the duplicate elements instead of the non-NA count in the same row – akrun Jan 14 '20 at 22:25
  • Thanks, but it doesn't work with the real life example: library (RCurl) a <- getURL('http://download1645.mediafire.com/pp9z3okh5tgg/96px8ophovxrxe9/example.tab') df2 <- read.table(text=a,header=TRUE, sep = "\t") df2 <- data.frame(lapply(df2, as.character), stringsAsFactors=FALSE) res <- df2 %>% mutate_if(~ all(is.na(.)) && is.logical(.), ~ NA_character_) %>% transmute(Summary = case_when(rowSums(!is.na(.)) > 1 ~ "Error", TRUE ~ coalesce(!!! .))) The first line of 'res' should be 'Normale' – Arturo Jan 14 '20 at 22:34
2

Reducing across the columns of df (starting with the first), compare the current column (old) to the next (new). For each element:

  • If old is NA, choose new

  • If old is not NA, then choose old, unless new is also not NA, then 'Error'

:

Reduce(
  function(old, new) ifelse(is.na(old), new, ifelse(!is.na(new), 'Error', old)),
  df)

# [1] "Cat"   "Dog"   "Mouse" "Error" "Horse" NA
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38