0

A similar question is asked before but it's not working in my dataset.

I have a dataset called "daten" where customers bought more than one item on the same day. I want to count the items bought on the same day by the customer. Each customer has a unique ID, thus I want to create a new variable and sum up the same dates corresponding to the ID number.

The purpose is summing up the item bought on the same day, I don't want to include the day if the user bought an item on a different day.

So the data should look like this with the new variable "number of items" :

Order Date    User ID    Number of Item 
31.05.2016    1          2
31.05.2016    1          2
01.09.2015    1          1
01.06.2017    15         1
07.08.2016    2          3
07.08.2016    2          3
07.08.2016    2          3

I used ave and aggregate functions but there must be something wrong in the logic of my code.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • 1
    Can you show the original dataset? You can just use `group_by` from `tidyverse` for this task. – m0nhawk Jan 14 '18 at 18:17
  • 1
    I would suggest something like `aggregate(dat$item, dat[c("orderdate","userid")], length)`, but without knowing anything more about your data, it's a bit difficult. However, your output makes no sense: if you want to group by date and userid, why do you have repeated rows? – r2evans Jan 14 '18 at 18:20
  • My original dataset has 100000 rows and frankly, I don't know how to show the dataset here. –  Jan 14 '18 at 18:22
  • @r2evans I am trying to predict the likelihood of returning an item so I want to add a new variable which tells how many items a customer bought on the same day. –  Jan 14 '18 at 18:27
  • 5
    If you have a large data.frame, use `dput(head(daten, 30))` and then post the output of this in the ***question***, not as a comment. (That command will output the first 30 rows in a format that we can use.) – Rui Barradas Jan 14 '18 at 18:34
  • 2
    There are resources available for helping you help us. Two of the more popular ones are https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and https://stackoverflow.com/help/mcve. Regardless, part of programming is being able to reduce required data and functionality to ensure that you get exactly what you want. If that means you filter your data down to 1 week (or month) of purchases for only two user ids, then that will sufficiently demonstrate the functionality you need. – r2evans Jan 14 '18 at 18:59
  • `daten<-join(daten, count(daten, c("order_date", "user_id")))` solved the problem. Thanks! –  Jan 14 '18 at 19:07

0 Answers0