How to encode top 5 values of a column

Question

I would like to encode the Top 5 occurring/frequency of values of 'Code' column in the Dataframe below:

ID | Code
1  | A
2  | A
3  | A
4  | F
5  | F
6  | C
7  | C
8  | E
9  | E
10 | D
10 | D
11 | B
12 | G
13 | H

The result should look like this:

ID | A | F | C | E | D |
1  | 1 | 0 | 0 | 0 | 0 | 
2  | 1 | 0 | 0 | 0 | 0 |
3  | 1 | 0 | 0 | 0 | 0 |
4  | 0 | 1 | 0 | 0 | 0 |
5  | 0 | 1 | 0 | 0 | 0 |
6  | 0 | 0 | 1 | 0 | 0 |
7  | 0 | 0 | 1 | 0 | 0 |
8  | 0 | 0 | 0 | 1 | 0 |
9  | 0 | 0 | 0 | 1 | 0 |
10 | 0 | 0 | 0 | 0 | 1 |
11 | 0 | 0 | 0 | 0 | 0 |
12 | 0 | 0 | 0 | 0 | 0 |
13 | 0 | 0 | 0 | 0 | 0 |

How can I go about using top_n and dcast functions in R? Or dplyr.

score 1 · Answer 1 · answered Feb 12 '20 at 08:16

1

Here is a base solution

as.matrix(table(df$ID,df$Code))[,names(sort(table(df$Code),decreasing=T)[1:5])]

     A C D E F
  1  1 0 0 0 0
  2  1 0 0 0 0
  3  1 0 0 0 0
  4  0 0 0 0 1
  5  0 0 0 0 1
  6  0 1 0 0 0
  7  0 1 0 0 0
  8  0 0 0 1 0
  9  0 0 0 1 0
  10 0 0 2 0 0
  11 0 0 0 0 0
  12 0 0 0 0 0
  13 0 0 0 0 0

answered Feb 12 '20 at 08:16

user2974951

9,535
1
17
24

This does not dummify the variable but rather counts instances. For example look at `ID = 10` of your output Vs OPs output – Sotos Feb 12 '20 at 08:22
@Sotos Yes, trivial modification, we can just replace all values greater than 1 to 1. – user2974951 Feb 12 '20 at 08:23
1

You mean 1..... – Sotos Feb 12 '20 at 08:23

How to encode top 5 values of a column

1 Answers1