I have two data frames as follows
The first is a survey table which tells when a person's survey was taken
ID = c('1000021','1000021')
SurveyDate = c('2014-05-30','2013-05-01')
dfsurvey = data.frame(ID,SurveyDate)
> dfsurvey
ID SurveyDate
1 1000021 2014-05-30
2 1000021 2013-05-01
The second is a hobbies table which tells the person's hobbies recorded on that day. On different days, his hobbies could be different.
ID = c('1000021','1000021','1000021','1000021','1000021','1000021','1000021')
HobbyName = c('Running','Volleyball','Pingpong','Badminton','Swimming','Running','Pingpong')
SurveyDate = c('2014-05-30','2014-05-30','2014-05-30','2014-05-30','2014-05-30','2013-05-01','2013-05-01')
dfhobby = data.frame(ID,HobbyName,SurveyDate)
> dfhobby
ID HobbyName SurveyDate
1 1000021 Running 2014-05-30
2 1000021 Volleyball 2014-05-30
3 1000021 Pingpong 2014-05-30
4 1000021 Badminton 2014-05-30
5 1000021 Swimming 2014-05-30
6 1000021 Running 2013-05-01
7 1000021 Pingpong 2013-05-01
To the survey table which has only two rows, I would like to add the expanded list of hobbies, each hobby getting it's own column, what I would call "flattening". Something like this,
#expected final output - add columns to dfsurvey
> dfsurvey
ID SurveyDate Hobby_Running Hobby_Volleyball Hobby_Pingpong Hobby_Badminton Hobby_Swimming
1 1000021 1 1 1 1 1
2 1000021 1 0 1 0 0
This is my code I basically first construct the column names, and then use a nested for loop to mark 1 against the hobby. However, this is very very slow, around one second for one iteration of the nested for loop
#making columns and setting them to 0 as default
hobbyvalues = unique(dfhobby$HobbyName)
for(i in 1:length(hobbyvalues))
{
print(i)
dfsurvey[paste("Hobby_",hobbyvalues[i],sep="")] = 0
}
#flattening iterative
for(i in 1:nrow(dfsurvey))
{
print(i)
listofhobbies = dfhobby[which(dfhobby$ID == dfsurvey[i,"ID"] & dfhobby$SurveyDate == dfsurvey[i,"SurveyDate"]),"HobbyName"]
if(length(listofhobbies) > 0)
{
for(l in 1:length(listofhobbies))
{
dfsurvey[i,paste("Hobby_",listofhobbies[l],sep="")] = 1
}
}
}
I have also tried the foreach package and doMC package and was able to write code in parallel. However, this is slow as well.
Is there a better way or library in R which can help me do this? Thanks.