mydata <- data.frame(id = c(1,1,1,2,2,3,4),
hobby = c("music", "sports", "science", "science", "lifestyle",
"party", "sports"),
x = c(10, 10, 10, 23, 23, 11, 0),
y = c(78, 78, 78, 55, 55, 22, 9))
> mydata
id hobby x y
1 1 music 10 78
2 1 sports 10 78
3 1 science 10 78
4 2 science 23 55
5 2 lifestyle 23 55
6 3 party 11 22
7 4 sports 0 9
I have a data.frame that's in a long format with 5 different unique hobbies: music, sports, science, lifestyle, and party. What's a quick way in R to obtain 5 data.frames, one for each hobby that's populated with 0/1?
The reason for this is that I want to run the following regression model 5 separate times. One for each unique hobby:
glm(y ~ hobby + offset(x), family = "poisson", data = dat_music))
glm(y ~ hobby + offset(x), family = "poisson", data = dat_sports))
glm(y ~ hobby + offset(x), family = "poisson", data = dat_science))
glm(y ~ hobby + offset(x), family = "poisson", data = dat_lifestyle))
glm(y ~ hobby + offset(x), family = "poisson", data = dat_party))
For each hobby, I want to summarize the data where each row corresponds to a unique id
.
For dat_music
, I want to have:
id hobby x y
1 1 1 10 78
2 2 0 23 55
3 3 0 11 22
4 4 0 0 9
For dat_sports
, I want to have:
id hobby x y
1 1 1 10 78
2 2 0 23 55
3 3 0 11 22
4 4 1 0 9
And so forth? Suppose in reality, I have 50k unique hobbies. What's an efficient way to do this in R?