Putting the content of many columns in a new single column

Question

Thanks for the answers to my previous question, but I need to update it, since the solutions don't work with my real life example, i.e., a 3170x11 dataframe.

Let me briefly recapitulate it. I have a 3170X11 dataframe, filled with the terms 'Normale', 'Delezioni' or NA. I would like to coalesce the column results in a new columns, reporting the type of term reported, that is indeed 'Normale', 'Delezioni' or 'NA'. In case 'Normale' and 'NA' are present on the same row, it should be reported 'Normale'. In case 'Delezioni' and 'NA' are present on the same row, it should be reported 'Delezioni'. In case only 'NA' are present should be reported 'NA'. However in case both 'Normali' and 'Delezioni' are present, it should be reported 'Error'. Akrun and others reported a nice solution (Coalescing many columns into one column), but, as I said, doesn't work when things become bigger:

library (RCurl)
a <- getURL('http://download1645.mediafire.com/pp9z3okh5tgg/96px8ophovxrxe9/example.tab')
df2 <- read.table(text=a,header=TRUE, sep = "\t")
df2 <- data.frame(lapply(df2, as.character), stringsAsFactors=FALSE) #converts from factor to character
res <- df2 %>%
   mutate_if(~ all(is.na(.)) && is.logical(.), ~ NA_character_) %>%
   transmute(Summary = case_when(rowSums(!is.na(.)) > 1 ~ "Error",
            TRUE ~ coalesce(!!! .)))

res contains several mistakes. For instance the first lines should be:

  Summary
1   Normale
2    <NA>
3    <NA>
4    <NA>
5   Normale
6   Normale

Instead they are:

> head (res)
  Summary
1   Error
2    <NA>
3    <NA>
4    <NA>
5   Error
6   Error

Thanks

Rui Barradas · Answer 1 · 2020-01-15T05:59:24.070

2

The following works for me, with the data set in the link.

f1 <- function(x){
  y <- unique(x[!is.na(x)])
  if(length(y) == 0) 
    NA 
  else if(length(y) == 1) 
    y 
  else "Error"
}

df2$Summary <- apply(df2, 1, f1)

And with no need for external packages, base R only.

edited Jan 15 '20 at 05:59

answered Jan 15 '20 at 05:30

Rui Barradas

70,273
8
34
66

Thanks, but there are errors. For instance the first element of df2$Summary, should be 'Normale', not 'Errors'. – Arturo Jan 15 '20 at 05:43

Ronak Shah · Accepted Answer · 2020-01-15T05:27:30.930

1

I think you can define a simple function which works based on your requirement

apply_fun <- function(x) {
  if(all(c("Delezioni","Normale") %in% x)) return('Error')
  if("Delezioni" %in% x) return('Delezioni')
  if("Normale" %in% x)  return('Normale')
  else NA
}

and then apply it row-wise

example$answer <- apply(example, 1, apply_fun)
head(example$answer)
#[1] "Normale" NA        NA        NA        "Normale" "Normale"

If needed a tidyverse/dplyr answer, we can convert these multiple if statements to case_when and then use pmap

library(tidyverse)

apply_fun <- function(x) {
  case_when(all(c("Delezioni","Normale") %in% x) ~ "Error", 
            "Delezioni" %in% x ~ "Delezioni", 
            "Normale" %in% x ~ "Normale", 
            TRUE ~NA_character_)
}

output <- example %>% mutate(answer = pmap_chr(., ~apply_fun(c(...))))

edited Jan 15 '20 at 05:27

answered Jan 15 '20 at 05:14

Ronak Shah

377,200
20
156
213

example$answer <- apply(example, 1, apply_fun) Error in apply(example, 1, apply_fun) : dim(X) must have a positive length – Arturo Jan 15 '20 at 06:04
@Arturo Are you using the correct dataframe name? I am using `example`, I guess in your case it is `df2`. You also need to run `apply_fun` in your console so that the function is present in your environment. – Ronak Shah Jan 15 '20 at 06:04
Yes. Even with df2 thins don't change: example$answer <- apply(df2, 1, apply_fun) Error in example$answer <- apply(df2, 1, apply_fun) : object of type 'closure' is not subsettable – Arturo Jan 15 '20 at 06:07
`df2$answer <- apply(df2, 1, apply_fun)` works for me as shown for the data that you have shared in the link. – Ronak Shah Jan 15 '20 at 06:09
It works! Thank you so much to you and to the orther one who answered my question. – Arturo Jan 15 '20 at 06:14

Putting the content of many columns in a new single column

2 Answers2