I have a data frame with non-numeric values with the following format:
DF1:
col1 col2
1 a b
2 a c
3 z y
4 z x
5 a d
6 m n
I need to convert it into this format,
DF2:
col1 col2 col3 col4
1 a b c d
2 z y x NA
3 m n NA NA
With col1 as the primary key (not sure if this is the right terminology in R), and the rest of the columns contain the elements associated with that key (as seen in DF1).
DF2 will include more columns compared to DF1 depending upon the number of elements associated with any key.
Some columns will have no value resulting from different number of elements associated with each key, represented as NA (as shown in DF2).
The column names could be anything.
I have tried to use the reshape(), melt() + cast(), even a generic for loop where I use cbind and try to delete the row.
It is part of a very big dataset with over 50 million rows. I might have to use cloud services for this task but that is a different discussion.
I am new to R so there might be some obvious solution which I am missing.
Any help would be much appreciated.
-Thanks