I need help with converting my long data of dimension 1558810 x 84 to a wide data of 1558810 x 4784
Let me explain in detail how and why. My raw data is as follows - The data has three main columns -
id empId dept
1 a social
2 a Hist
3 a math
4 b comp
5 a social
6 b comp
7 c math
8 c Hist
9 b math
10 a comp
id is the unique key that tells which employee went to which department in a university on a day. I need this to be transformed as below.
id empId dept social Hist math comp
1 a social 1 0 0 0
2 a Hist 0 1 0 0
3 a math 0 0 1 0
4 b comp 0 0 0 1
5 a social 1 0 0 0
6 b comp 0 0 0 1
7 c math 0 0 1 0
8 c Hist 0 1 0 0
9 b math 0 0 1 0
10 a comp 0 0 0 1
I have two datasets one with 49k rows and one with 1.55million rows. For the smaller dataset which had 1100 unique department values, I used dcast in the reshape2 package to get the desired dataset(thus, transformed data would have 3+1100 columns and 49k rows). But when I use the same function on my larger dataset that has 4700 unique department values, my R crashes because of Memory issue. I tried varous other alternative like xtabs, reshape etc. but every time it failed with Memory error.
I have now resorted to a crude FOR loop for this purpose -
columns <- unique(ds$dept)
for(i in 1:length(unique(ds$dept)))
{
ds[,columns[i]] <- ifelse(ds$dept==columns[i],1,0)
}
But this is extremely slow and the code has been running for 10 hrs now. Is there any workaround for this, that I am missing?
ANy suggestions would be of great help!