-1

I would like to encode the Top 5 occurring/frequency of values of 'Code' column in the Dataframe below:

ID | Code
1  | A
2  | A
3  | A
4  | F
5  | F
6  | C
7  | C
8  | E
9  | E
10 | D
10 | D
11 | B
12 | G
13 | H

The result should look like this:

ID | A | F | C | E | D |
1  | 1 | 0 | 0 | 0 | 0 | 
2  | 1 | 0 | 0 | 0 | 0 |
3  | 1 | 0 | 0 | 0 | 0 |
4  | 0 | 1 | 0 | 0 | 0 |
5  | 0 | 1 | 0 | 0 | 0 |
6  | 0 | 0 | 1 | 0 | 0 |
7  | 0 | 0 | 1 | 0 | 0 |
8  | 0 | 0 | 0 | 1 | 0 |
9  | 0 | 0 | 0 | 1 | 0 |
10 | 0 | 0 | 0 | 0 | 1 |
11 | 0 | 0 | 0 | 0 | 0 |
12 | 0 | 0 | 0 | 0 | 0 |
13 | 0 | 0 | 0 | 0 | 0 |

How can I go about using top_n and dcast functions in R? Or dplyr.

spidermarn
  • 959
  • 1
  • 10
  • 18

1 Answers1

1

Here is a base solution

as.matrix(table(df$ID,df$Code))[,names(sort(table(df$Code),decreasing=T)[1:5])]

     A C D E F
  1  1 0 0 0 0
  2  1 0 0 0 0
  3  1 0 0 0 0
  4  0 0 0 0 1
  5  0 0 0 0 1
  6  0 1 0 0 0
  7  0 1 0 0 0
  8  0 0 0 1 0
  9  0 0 0 1 0
  10 0 0 2 0 0
  11 0 0 0 0 0
  12 0 0 0 0 0
  13 0 0 0 0 0
user2974951
  • 9,535
  • 1
  • 17
  • 24