0

Suppose I have a data set which has 2 columns:

  visit   purchase
  5       2
  7       3

and I want to transform it to 1 column(for logit regression analysis) In following column "purchase", 1 mean 1 purchase and 0 means no purchase, the total numbers of observation in purchase would equal to the sum of visit

I have tried

df.expanded <- df[rep(row.names(df), pmax(df$Predators, 1)),]

from this question and successfully expanded the observation. However, I don't know how to transform the value under column "purchase" after I expanded the row, as it looks like this

purchase
2
2
2
2
2
3
3
3
3
3
3
3

As the number of observation is indeed equal to 12, however, it also copied the number of purchase.

The data set I am working on now is pretty big therefore it is quite impossible to do it manually.

New:

This is part of my original dataset https://i.stack.imgur.com/DByGX.png

and in R, the data frame is named 'try6'

So I enter this in console:

expand_01 <- function(x) {
  rep(c(1,0),
      c(x[["installs"]],x[["reach"]]-x[["installs"]]))
}
unlist(apply(try6,1,expand_01))

But an error is listed as follow:

Error in x[["reach"]] - x[["installs"]] : non-numeric argument to binary operator

I don't understand because the error said the value under those columns are non numerical(? or I misunderstood), but there are only numbers under the two columns..

Thank you for your help!!

Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   379 obs. of  7 variables:
 $ dow     : POSIXct, format: "2019-05-09" "2019-05-09" ...
 $ country : chr  "IT" "IT" "IT" "IT" ...
 $ adtype  : chr  "VID" "VID" "VID" "VID" ...
 $ age     : num  6 5 5 4 4 3 3 2 2 2 ...
 $ gender  : num  1 1 2 1 2 1 2 3 1 2 ...
 $ reach   : num  15 26 2 47 4 34 2 1 45 4 ...
 $ installs: num  0 0 0 0 0 1 0 0 0 0 ...

After I put

 try8 <- try6 %>% head() %>% select(reach,installs)

please refer to this picture: https://i.stack.imgur.com/IKggu.png

and then I put

 dput(try8)

and it shows

 structure(list(reach = c(15, 26, 2, 47, 4, 34), installs = c(0, 
 0, 0, 0, 0, 1)), row.names = c(NA, -6L), class = c("tbl_df", 
 "tbl", "data.frame"))

names(try6) is 

[1] "dow"      "country"  "adtype"   "age"      "gender"   "reach"    "installs"

New picture for

  reach <- try6$reach
  installs <- try6$installs

  new <- rep(0, sum(reach))

  for(j in 1:length(installs)){
  new[(sum(reach[0:(j-1)])+1):(sum(reach[0:(j-1)])+installs[j])] <- 1
  }

Picture: https://i.stack.imgur.com/CXS22.png

And also sometimes when there are for example 4 instalss, but the new results give five 1 (like 5 observation instead of 4)

picture: https://i.stack.imgur.com/Yc7tD.png

a lot of thanks!

Wynona
  • 1
  • 2
  • can you cut & paste either `str(try6)` or `dput(head(try6))` into your question? – Ben Bolker Aug 07 '19 at 16:46
  • Hello I just edited the question! Many thanks for helping me out:) – Wynona Aug 07 '19 at 17:06
  • problem could be related to `tbl` vs `data.frame` – Ben Bolker Aug 07 '19 at 17:18
  • it's not `tbl`-related, apparently. Can't go farther without a reproducible example: if you can make the error occur with a data set `try7 <- try6 %>% head() %>% select(reach,installs)` (assuming you're using tidyverse), then you should be able to `dput(try7)` and paste the results into your question so we can reproduce the problem. – Ben Bolker Aug 07 '19 at 20:24
  • what is `names(try6)` ? – Ben Bolker Aug 07 '19 at 20:24
  • Hello, thank you for your reply! I edited my question above, many thanks!!! ( I downloaded tidyverse) – Wynona Aug 09 '19 at 10:05

2 Answers2

2

You don't need to transform your data to analyze it; you can run binomial regression:

glm(cbind(purchase,visit-purchase) ~ x1 + x2 + x3 ..., 
          family=binomial(link="logit"),
          data= ...)

This is statistically equivalent to logistic regression and much more efficient!

If you really need to expand to zeros and ones ...

dd <- read.table(header=TRUE,
text="
visit   purchase
  5       2
  7       3
")
## convert to tibble, just in case that makes a difference
dd <- tibble::as_tibble(dd)
expand_01 <- function(x) {
     rep(c(1,0),
         c(x[["purchase"]],x[["visit"]]-x[["purchase"]]))
}
unlist(apply(dd,1,expand_01))
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Thank you for your answer! I am doing a homework and transforming the data set is part of it. Do you also happen to know how to transform it? many thanks for help me out! – Wynona Aug 07 '19 at 13:51
  • thank you for helping me out again:) I tried to expand it and there is an error: object "dd" not found. The original data I am using now is a data frame in R (is it relevant?) many thanks:)) – Wynona Aug 07 '19 at 14:18
  • `dd` is the name of the data frame I used. You should substitute the name of your data frame. – Ben Bolker Aug 07 '19 at 14:23
  • Thank you! you are amazing! It is just there is this Error in apply(try4, 1, expand_01) : dim(X) must have a positive length, I tried fro this site, https://stackoverflow.com/questions/28423275/dimx-must-have-a-positive-length-when-applying-function-in-data-frame/28423503 and used lapply but it is not working... I just started using R, so I am not very familiar with it, your help is much appreciated:) Many thanks! – Wynona Aug 07 '19 at 14:37
  • can't really answer that without a reproducible example. You could edit your question to include a [mcve] ... – Ben Bolker Aug 07 '19 at 14:45
  • Hello, I updated the question. I don't know if it makes sense. Thank you so much:) – Wynona Aug 07 '19 at 16:37
0

Just using indexing in a for loop as an alternative... it ain't pretty but:

visit <- c(5,7)
buy <- c(2,3)

new <- rep(0, sum(visit))

for(j in 1:length(buy)){
  new[(sum(visit[0:(j-1)])+1):(sum(visit[0:(j-1)])+buy[j])] <- 1
}
rg255
  • 4,119
  • 3
  • 22
  • 40
  • Hi there! thank you for helping me out! I tried your formula but the thing is that for original buy which are equal to 0, the outcome of the formula gives two 1. I uploaded a picture in my question to explain it, thank you in advance:) – Wynona Aug 09 '19 at 12:18
  • The pictures for your formula is at the bottom under 'New', many thanks! – Wynona Aug 09 '19 at 12:51
  • If this doesn't achieve what you want then please describe why in the comments here – rg255 Aug 14 '19 at 04:22