0

I have the following dataframe:

> head (PRED_BEST_SYSM_TEST2)
  V1 V2 V3 V4 V5
1  0  0  0  0  0
2  2  2  2  1  1
3  0  0  0  0  0
4  0  3  4  0  0
5  5  5  1  2  0
6  0  0  0  1  1

I would like to add column to the dataframe that will contain the number the appears most times in each row. As followed:

  V1 V2 V3 V4 V5 max_res
1  0  0  0  0  0    0
2  2  2  2  1  1    2
3  0  0  0  0  0    0
4  0  3  4  0  0    0
5  5  5  1  2  0    5
6  0  0  1  1  1    1

I use the following code:

g <- function(df)
{
  X <- as.data.frame(t(apply( df, 1,
                              function(row)
                              {
                                u <- unique(row)
                                n <- rowSums(outer(u,row,"=="))
                                if (length(u)==1 )
                                {
                                  c(row,u[which.max(n)],max(n),"",0)
                                }
                                else
                                {
                                  c(row,u[which.max(n)],max(n))
                                }
                              })))  

  colnames(X) <- c(colnames(df),"max_res")

  return(X)
}

g1<-g(PRED_BEST_SYSM_TEST2)

When I try to >head (g1) I get very weird results such as:

  NA                  NA                  NA                  NA                  NA
                        NA                  NA                  NA                  NA                  NA
                        NA                  NA                  NA                  NA                  NA
                   NA                  NA                  NA                  NA                  NA                  NA
                   NA                  NA                  NA                  NA                       NA
                   NA                  NA                  NA                  NA                       NA
                   NA                  NA                  NA                  NA                       NA
                   NA                  NA                  NA                       NA                  NA

The PRED_BEST_SYSM_TEST2 dataframe details are:

 > str (PRED_BEST_SYSM_TEST2)
'data.frame':   100000 obs. of  5 variables:
 $ V1: Factor w/ 10 levels "0","1","2","3",..: 1 1 1 1 1 1 1 2 1 2 ...
 $ V2: Factor w/ 10 levels "0","1","2","3",..: 1 1 1 1 1 1 2 2 1 2 ...
 $ V3: Factor w/ 10 levels "0","1","2","3",..: 1 1 1 1 1 1 1 2 1 1 ...
 $ V4: Factor w/ 10 levels "0","1","2","3",..: 1 2 1 1 1 2 1 2 1 2 ...
 $ V5: Factor w/ 10 levels "0","1","2","3",..: 1 2 1 1 1 2 2 2 1 1 ...
Cath
  • 23,906
  • 5
  • 52
  • 86
Avi
  • 2,247
  • 4
  • 30
  • 52
  • Thanks @Cath! Is there a way to convert all the dataframe into numeric dataframe? – Avi Nov 10 '17 at 13:21
  • The dataframe is very big.... – Avi Nov 10 '17 at 13:24
  • This is how it is created: PRED_BEST_SYSM_TEST2 = matrix(0, nrow(testing2), 5) PRED_BEST_SYSM_TEST2<-as.data.frame (PRED_BEST_SYSM_TEST2) for (i in 1:(5)) { PRED_BEST_SYSM_TEST2[,i]<- (predict(cart.models[BEST_SYSM_TREES_TRAIN[[i]]], testing2[,c(1:10)],type='class'))} – Avi Nov 10 '17 at 13:24
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/158694/discussion-between-avi-and-cath). – Avi Nov 10 '17 at 13:32
  • @Cath! This is not duplicate! the list of Q&A are for maximal values! My is for maximal number of occurrence! – Avi Nov 10 '17 at 13:49
  • Sorry, your title was misleading, I changed the target – Cath Nov 10 '17 at 14:05

0 Answers0