0

I have a dataframe containing 52 rows and 161 columns. I have given the structure of my dataframe.

>str(CEPH)
'data.frame':   52 obs. of  161 variables:
$ id         : chr  "85" "86" "94" "00" ...
$ subgroup   : chr  "AAA" "AAA" "AAA" "AAA" ...
$ A1_A   : chr  "3:01" "3:01" "2:01" "2:01" ...
$ A1_B   : chr  "" "" "" "" ...
$ A2_A   : chr  "2:01" "32:01:01" "32:01:01" "68:01:02" ...
$ A2_B   : chr  "" "32:01:02" "32:01:02" "" ...
$ A2_C   : chr  "" "" "" "" ...
$ B1_A   : chr  "7:02:01" "44:03:01" "40:02:00" "44:02:00" ...
...

I have more NAs in some of columns. Hence I need to find both first and second highest frequencies. I tried the following codes. But there more than 50 columns. Its not possible to pass column one by one. is there any method to retrive using sapply

Input data:

 id subgroup A1_A A1_B A1_C A1_D A1_E A1_F A1_G  
 1  85     AAA     3:01   ""       ""        ""      ""       ""                                                                                                                 
 2  86     AAA     3:01   05:01    ""        07:08   ""       ""                                                                                                                              
 3  94     AAA     2:01   05:01    ""        ""      ""       ""                                                                                                                                              
 4  000    AAA     2:01   06:07    ""        ""      ""       ""                                                                                                                                              
 5  37     AAA 30:01:00   07:08    ""        ""      ""       ""                                                                                                                                              
 6  48     AAA     2:01   01:01    ""        ""      ""       "" 

fre <- function(CEPH,col) {
q<-sort(table(CEPH[,col]),decreasing = TRUE)[1:2]
          return(q) }
 fre(AAA,4)

And I got the output without column names

  NA      32:01:02 
  49        2 

The desire output

Types   Frequent_Type    Highest_Frequency       
A1_A     2:01            20
A1_A     NA               5
A1_B     NA              49    
A1_B     3:01:01         5  
A1_C     2:01            20
A1_C     05:02            2
  • @akrun I have added the small reproducible example of input data –  Feb 15 '18 at 13:56
  • 1
    Please go through [this link](http://stackoverflow.com/questions/5963269) about reproducibility – Sotos Feb 15 '18 at 13:58
  • I think you can check [here](https://stackoverflow.com/questions/2547402/is-there-a-built-in-function-for-finding-the-mode) for ideas – akrun Feb 15 '18 at 14:02
  • 1
    What about if you try `apply(CEPH,2,function(x) sort(table(x),decreasing = T)[1:2])` – R18 Feb 15 '18 at 14:03
  • The same deleted post? https://stackoverflow.com/questions/48766725/frequency-count-of-multiple-columns-and-retrive-the-highest-frequency-in-r Why not edit, instead of posting the same? – zx8754 Feb 15 '18 at 14:09
  • @R18, Thank you. I tried. But I got only the number of frequncies but not the type of frequency. I also tried x<-t(sapply(CEPH, function(x) {t_x <- sort(table(x), decreasing=TRUE)[1:2]; list(value=names(t_x)[1], freq=t_x[1])})). But I am getting only the first highest frequency –  Feb 15 '18 at 14:15
  • @zx8754. That was marked as duplicated. That's y I created a new post –  Feb 15 '18 at 14:16
  • 1
    Better would be to edit existing post (even if closed) and vote to re-open. Did you try the [suggested solution](https://stackoverflow.com/a/48767492/680068) in the linked posts? – zx8754 Feb 15 '18 at 14:20

1 Answers1

0

This may not be the exact solution. But somehow I managed to get both the frequencies separately and merged togther.

first_highest<-t(sapply(CEPH, function(x) {t_x <- sort(table(x), decreasing=TRUE); list(value=names(t_x)[1],freq=t_x[1])}))
second_highest<-t(sapply(CEPH, function(x) {t_x <- sort(table(x), decreasing=TRUE); list(value=names(t_x)[2],freq=t_x[2])}))

frequeny<-cbind(first_highest,second_highest)