1

Thanks for the answers to my previous question, but I need to update it, since the solutions don't work with my real life example, i.e., a 3170x11 dataframe.

Let me briefly recapitulate it. I have a 3170X11 dataframe, filled with the terms 'Normale', 'Delezioni' or NA. I would like to coalesce the column results in a new columns, reporting the type of term reported, that is indeed 'Normale', 'Delezioni' or 'NA'. In case 'Normale' and 'NA' are present on the same row, it should be reported 'Normale'. In case 'Delezioni' and 'NA' are present on the same row, it should be reported 'Delezioni'. In case only 'NA' are present should be reported 'NA'. However in case both 'Normali' and 'Delezioni' are present, it should be reported 'Error'. Akrun and others reported a nice solution (Coalescing many columns into one column), but, as I said, doesn't work when things become bigger:

library (RCurl)
a <- getURL('http://download1645.mediafire.com/pp9z3okh5tgg/96px8ophovxrxe9/example.tab')
df2 <- read.table(text=a,header=TRUE, sep = "\t")
df2 <- data.frame(lapply(df2, as.character), stringsAsFactors=FALSE) #converts from factor to character
res <- df2 %>%
   mutate_if(~ all(is.na(.)) && is.logical(.), ~ NA_character_) %>%
   transmute(Summary = case_when(rowSums(!is.na(.)) > 1 ~ "Error",
            TRUE ~ coalesce(!!! .)))

res contains several mistakes. For instance the first lines should be:

  Summary
1   Normale
2    <NA>
3    <NA>
4    <NA>
5   Normale
6   Normale

Instead they are:

> head (res)
  Summary
1   Error
2    <NA>
3    <NA>
4    <NA>
5   Error
6   Error 

Thanks

Arturo
  • 342
  • 1
  • 4
  • 14

2 Answers2

2

The following works for me, with the data set in the link.

f1 <- function(x){
  y <- unique(x[!is.na(x)])
  if(length(y) == 0) 
    NA 
  else if(length(y) == 1) 
    y 
  else "Error"
}

df2$Summary <- apply(df2, 1, f1)

And with no need for external packages, base R only.

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Thanks, but there are errors. For instance the first element of df2$Summary, should be 'Normale', not 'Errors'. – Arturo Jan 15 '20 at 05:43
1

I think you can define a simple function which works based on your requirement

apply_fun <- function(x) {
  if(all(c("Delezioni","Normale") %in% x)) return('Error')
  if("Delezioni" %in% x) return('Delezioni')
  if("Normale" %in% x)  return('Normale')
  else NA
}

and then apply it row-wise

example$answer <- apply(example, 1, apply_fun)
head(example$answer)
#[1] "Normale" NA        NA        NA        "Normale" "Normale"

If needed a tidyverse/dplyr answer, we can convert these multiple if statements to case_when and then use pmap

library(tidyverse)

apply_fun <- function(x) {
  case_when(all(c("Delezioni","Normale") %in% x) ~ "Error", 
            "Delezioni" %in% x ~ "Delezioni", 
            "Normale" %in% x ~ "Normale", 
            TRUE ~NA_character_)
}

output <- example %>% mutate(answer = pmap_chr(., ~apply_fun(c(...))))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • example$answer <- apply(example, 1, apply_fun) Error in apply(example, 1, apply_fun) : dim(X) must have a positive length – Arturo Jan 15 '20 at 06:04
  • @Arturo Are you using the correct dataframe name? I am using `example`, I guess in your case it is `df2`. You also need to run `apply_fun` in your console so that the function is present in your environment. – Ronak Shah Jan 15 '20 at 06:04
  • Yes. Even with df2 thins don't change: example$answer <- apply(df2, 1, apply_fun) Error in example$answer <- apply(df2, 1, apply_fun) : object of type 'closure' is not subsettable – Arturo Jan 15 '20 at 06:07
  • `df2$answer <- apply(df2, 1, apply_fun)` works for me as shown for the data that you have shared in the link. – Ronak Shah Jan 15 '20 at 06:09
  • It works! Thank you so much to you and to the orther one who answered my question. – Arturo Jan 15 '20 at 06:14