Suppose I have a data set which has 2 columns:
visit purchase
5 2
7 3
and I want to transform it to 1 column(for logit regression analysis) In following column "purchase", 1 mean 1 purchase and 0 means no purchase, the total numbers of observation in purchase would equal to the sum of visit
I have tried
df.expanded <- df[rep(row.names(df), pmax(df$Predators, 1)),]
from this question and successfully expanded the observation. However, I don't know how to transform the value under column "purchase" after I expanded the row, as it looks like this
purchase
2
2
2
2
2
3
3
3
3
3
3
3
As the number of observation is indeed equal to 12, however, it also copied the number of purchase.
The data set I am working on now is pretty big therefore it is quite impossible to do it manually.
New:
This is part of my original dataset https://i.stack.imgur.com/DByGX.png
and in R, the data frame is named 'try6'
So I enter this in console:
expand_01 <- function(x) {
rep(c(1,0),
c(x[["installs"]],x[["reach"]]-x[["installs"]]))
}
unlist(apply(try6,1,expand_01))
But an error is listed as follow:
Error in x[["reach"]] - x[["installs"]] : non-numeric argument to binary operator
I don't understand because the error said the value under those columns are non numerical(? or I misunderstood), but there are only numbers under the two columns..
Thank you for your help!!
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 379 obs. of 7 variables:
$ dow : POSIXct, format: "2019-05-09" "2019-05-09" ...
$ country : chr "IT" "IT" "IT" "IT" ...
$ adtype : chr "VID" "VID" "VID" "VID" ...
$ age : num 6 5 5 4 4 3 3 2 2 2 ...
$ gender : num 1 1 2 1 2 1 2 3 1 2 ...
$ reach : num 15 26 2 47 4 34 2 1 45 4 ...
$ installs: num 0 0 0 0 0 1 0 0 0 0 ...
After I put
try8 <- try6 %>% head() %>% select(reach,installs)
please refer to this picture: https://i.stack.imgur.com/IKggu.png
and then I put
dput(try8)
and it shows
structure(list(reach = c(15, 26, 2, 47, 4, 34), installs = c(0,
0, 0, 0, 0, 1)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
names(try6) is
[1] "dow" "country" "adtype" "age" "gender" "reach" "installs"
New picture for
reach <- try6$reach
installs <- try6$installs
new <- rep(0, sum(reach))
for(j in 1:length(installs)){
new[(sum(reach[0:(j-1)])+1):(sum(reach[0:(j-1)])+installs[j])] <- 1
}
Picture: https://i.stack.imgur.com/CXS22.png
And also sometimes when there are for example 4 instalss, but the new results give five 1 (like 5 observation instead of 4)
picture: https://i.stack.imgur.com/Yc7tD.png
a lot of thanks!