0

Considering Titanic data set, I created several predictions on the survival and I want to create the final Survival based on a vote system, meaning that if the majority of the predictions stipulate that the passenger survived the final outcome is 1, 0 otherwise

> str(temp)
'data.frame':   179 obs. of  3 variables:
 $ predictions_ldm    : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ predictions_qda    : Factor w/ 2 levels "0","1": 1 1 1 1 2 1 1 1 1 1 ...
 $ predictions_glm_age: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
> temp[c(4,5,12),]
   predictions_ldm predictions_qda predictions_glm_age
4                0               0                   0
5                0               1                   0
12               1               1                   0

I want the result to be

> temp[c(4,5,12),]
   predictions_ldm predictions_qda predictions_glm_age            Survived
4                0               0                   0                   0
5                0               1                   0                   0
12               1               1                   0                   1

How can I achieve this?

beasst
  • 109
  • 8
  • 1
    you could get the rowMeans, then if they are greater than 0.5 put 1, else 0? like `ifelse(rowMeans(temp) > 0.5, 1, 0)`? – morgan121 Jun 03 '20 at 21:42
  • The problem is that temp is a dataframe of factors – beasst Jun 03 '20 at 21:51
  • I found a solution, but I don't think that is the optimal one `votedSurvival <- as.factor(as.numeric(apply(temp,1,FUN = function(z) {mean(as.numeric(z))>0.5})))` – beasst Jun 03 '20 at 21:51
  • 1
    good solution, another similar way is to use `apply(temp, 1, function(x) names(sort(-table(x[1]))))` in the function, which doesn't convert to numeric – morgan121 Jun 03 '20 at 21:58

2 Answers2

2

It's an unneccessary complicated solution using dplyr but I really wanted to use c_across(). At first I needed to convert your factors into integers keeping the 0-1 values.

temp %>%
  mutate(across(where(is.factor), function(x) { x %>% 
      as.character() %>% 
      as.integer()
    } )) %>%
  rowwise %>%
  mutate(Survived = c_across() %>% 
           mean() %>%
           round() %>%
           as.integer)
Martin Gal
  • 16,640
  • 5
  • 21
  • 39
0

You can use the Mode function defined here :

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

and apply this function row-wise :

temp$Survived <- apply(temp[c(4,5,12),], 1, Mode)

Mode returns the most frequently occurring value from a vector.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213