1

I've looked here and here for answers, but I have not gotten what I've needed to summarize my dataframe. I think this answer is the closest to what I need.

I want to see the number of times a client has ordered in a particular month, and then how many of the unique ids occurred within the month. The "id" is the unique client id, and "date" is when the transaction occurred.

Here is what the data looks like:

Sample Data:

id   date
1    3/12/2016
2    3/14/2016
3    3/11/2016
1    4/19/2016
1    4/21/2016
3    5/21/2016
2    6/7/2016
1    6/8/2016

And what I would like the result to be is:

Result:

date     percent  
03-2016  100%
04-2016  33%
05-2016  33%
06-2016  66%

For reference:

length(unique(df$id)) = 3

Suggestions on what I should be doing?

Community
  • 1
  • 1
Chef1075
  • 2,614
  • 9
  • 40
  • 57

1 Answers1

2

We can use by and create a specific TimePeriod column:

dat$date <- as.Date(dat$date, '%m/%d/%Y')
dat$TimePeriod <- paste(format(dat$date, '%Y'),'-',format(dat$date, '%m'))

unique_id <- length(unique(dat$id))

setNames(stack(
  by(dat, dat$TimePeriod, function(x) length(unique(x$id)) / unique_id)
  ), c('percent', 'date'))

    percent      date
1 1.0000000 2016 - 03
2 0.3333333 2016 - 04
3 0.3333333 2016 - 05
4 0.6666667 2016 - 06

data

dat <- read.table(text = 'id   date
1    3/12/2016
                  2    3/14/2016
                  3    3/11/2016
                  1    4/19/2016
                  1    4/21/2016
                  3    5/21/2016
                  2    6/7/2016
                  1    6/8/2016', header = TRUE, stringsAsFactors = FALSE)
bouncyball
  • 10,631
  • 19
  • 31
  • 2
    I guess a `data.table` version might be something like this: `as.data.table(mydf)[, date := as.IDate(date, format = "%m/%d/%Y")][, list(pct = length(unique(id))/unique_id), .(mon_yr = sprintf("%02d-%s", month(date), year(date)))]`. +1 – A5C1D2H2I1M1N2O1R2T1 Apr 10 '17 at 17:05
  • @A5C1D2H2I1M1N2O1R2T1 : to round to the nearest month, instead of `sprintf` you could use one of the niceties of data.table: `[...] .(mon_yr = round(date, 'month')) [...]`. – Jealie Apr 10 '17 at 17:40
  • @Jealie, Good call! Thanks. – A5C1D2H2I1M1N2O1R2T1 Apr 10 '17 at 17:43