I have a dataframe containing 52 rows and 161 columns. I have given the structure of my dataframe.
>str(CEPH)
'data.frame': 52 obs. of 161 variables:
$ id : chr "85" "86" "94" "00" ...
$ subgroup : chr "AAA" "AAA" "AAA" "AAA" ...
$ A1_A : chr "3:01" "3:01" "2:01" "2:01" ...
$ A1_B : chr "" "" "" "" ...
$ A2_A : chr "2:01" "32:01:01" "32:01:01" "68:01:02" ...
$ A2_B : chr "" "32:01:02" "32:01:02" "" ...
$ A2_C : chr "" "" "" "" ...
$ B1_A : chr "7:02:01" "44:03:01" "40:02:00" "44:02:00" ...
...
I have more NAs in some of columns. Hence I need to find both first and second highest frequencies. I tried the following codes. But there more than 50 columns. Its not possible to pass column one by one. is there any method to retrive using sapply
Input data:
id subgroup A1_A A1_B A1_C A1_D A1_E A1_F A1_G
1 85 AAA 3:01 "" "" "" "" ""
2 86 AAA 3:01 05:01 "" 07:08 "" ""
3 94 AAA 2:01 05:01 "" "" "" ""
4 000 AAA 2:01 06:07 "" "" "" ""
5 37 AAA 30:01:00 07:08 "" "" "" ""
6 48 AAA 2:01 01:01 "" "" "" ""
fre <- function(CEPH,col) {
q<-sort(table(CEPH[,col]),decreasing = TRUE)[1:2]
return(q) }
fre(AAA,4)
And I got the output without column names
NA 32:01:02
49 2
The desire output
Types Frequent_Type Highest_Frequency
A1_A 2:01 20
A1_A NA 5
A1_B NA 49
A1_B 3:01:01 5
A1_C 2:01 20
A1_C 05:02 2