I have a data frame with four columns :
df=data.frame( UserId=c(1,2,2,2,3,3), CatoId=c('C','A','B','C','D','E'), No=c(1,9,2,2,5,3))
UserId CatoId No
1 C 1
2 A 9
2 B 2
2 C 2
3 D 5
3 E 3
I would like to transform the structure into the following one :
UserId A B C D E
1 0 0 1 0 0
2 9 2 2 0 0
3 0 0 0 5 3
Where the columns represents all possible values in CatoId
.
The first data frame has 2 million rows and CatoId
has 21 different values. So I don't want to use any loops. Is there a way to do this with R. Otherwise what is the best way to proceed?
My goal would be to apply a clustering algorithm on the last dataframe.