0

I have a data frame with 29 rows and 26 column with a lot of NA's. Data looks somewhat like shown below( working on R studio)

df <-

V1   V2   V3   V4   V5   V6   V7   V8   V9   V10

a1   b1   d    f    d    d    na   na   na    f

a1   b2   d    d    d    f    na   f    na   na

a1   b3   d    f    d    f    na   na   d    d

a2   c1   f    f    d    na   na   d    d    f

a2   c2   f    d    d    f    na   f    na   na

a2   c3   d    f    d    f    na   na   f    d

Here we have V1-V10 columns. a1 and a2 are 2 distinct values in column V1,

b1-b3 in column V2 are distinct values related to a1 in V1 and c1-c3 related to a2.

column V3- V10 we have distinct values in each rows related to a1 and a2

Result i want is as below-

NewV1      max.occurrence(V3-V10)

 a1            d

 a2            f

to summarize i want to get the value with maximum occurrence(max.occurrence(V3-V10)) across column V3-V10 based on V1. NOTE= NA to be excluded.

talat
  • 68,970
  • 21
  • 126
  • 157
Abhinav Sharma
  • 45
  • 1
  • 2
  • 8
  • 4
    Is that like 'psudo-reproducible' example? - FYI [How to make a great R reproducible example?](http://stackoverflow.com/questions/5963269) – Sotos Mar 16 '18 at 15:30
  • it is like an example of result i want to achieve out of a similar dataset. – Abhinav Sharma Mar 16 '18 at 15:32

2 Answers2

1

Another possiblity using the data.table-package:

library(data.table)

melt(setDT(df),
     id = 1:2,
     na.rm = TRUE)[, .N, by = .(V1, value)
                   ][order(-N), .(max.occ = value[1]), by = V1]

which gives:

   V1 max.occ
1: a1       d
2: a2       f

A similar logic with the tidyverse-packages:

library(dplyr)
library(tidyr)

df %>% 
  gather(k, v, V3:V10, na.rm = TRUE) %>% 
  group_by(V1, v) %>% 
  tally() %>% 
  arrange(-n) %>% 
  slice(1) %>% 
  select(V1, max.occ = v)
Jaap
  • 81,064
  • 34
  • 182
  • 193
0

If you like dplyr, this would work:

df %>% 
    gather("key", "value", V3:V10) %>%
    group_by(V1) %>% 
    dplyr::summarise(max.occurence = names(which.max(table(value))))

This gives:

# A tibble: 2 x 2
  V1    max.occurence
  <fct> <chr>        
1 a1    d            
2 a2    f 
C. Braun
  • 5,061
  • 19
  • 47