I am new to R programming. I have invoice details of each customer, with dates and would like to know average buying cycle for each customer. Data is like:
----------------------------
|Cust_no |Invoice_no | Date|
----------------------------
| C001 | W012 | 4/1/2015
-----------------------------------
| C001 | W234 | 4/11/2015
-----------------------------------
|C001 | W321 | 5/1/2015
-----------------------------------
|C002 | W131 | 4/1/2015
----------------------------------
|C002 | W245 | 4/13/2015
----------------------------------
|C003 | W047 | 3/18/2015
----------------------------------
so for each customer, the code should sum the difference between consecutive dates and divide it by (n-1). n= no. of transactions for each customer. if there is only one transaction present, then I would like to output a default no. (say 45 days)
next after getting avg. buying cycle (C) for each customer, I would like like to see which customer buying cycle is approaching(to send notification) example: flag 'buying period near' as '1' if: (C-2)*n <= days passed since last invoice <= (C+2)*n n=1,2,3...
suppose today is 15 may 2015, then we will have following output.
Expected output For customer C001, average buying cycle would be = (10+20)/(3-1)= 15 days. For customer C002, average buying cycle would be = (10)/(2-1)= 10 days For customer C003, 45 days since only one transaction, given default value of 45.
---------------------------------------------------------------------------
|Cust_no |Buying_cycle|last_invoice_date|time_since_last_inv|buy_perd_near
---------------------------------------------------------------------------
| C001 | 15 | 5/1/2015 |14 | 1
---------------------------------------------------------------------------
| C002 | 12 | 4/13/2015 |32 | 0
---------------------------------------------------------------------------
| C003 | 45 | 3/18/2015 |68 | 0
---------------------------------------------------------------------------
I am facing challenge to find avg. buying period. Once I can get the avg. buying period, I can apply logic to find whether buying period is near or not.
Note: The raw data will not be sorted by customer or ascending/descending dates.
{
structure(list(cust_no = structure(c(1L, 1L, 1L, 2L, 2L, 3L), .Label = c("c001",
"c002", "c003"), class = "factor"), inv_no = structure(c(1L,
3L, 4L, 2L, 6L, 5L), .Label = c("w015", "w087", "w167", "w231",
"w234", "w456"), class = "factor"), date = structure(c(1L, 2L,
5L, 1L, 3L, 4L), .Label = c("2015-4-1", "2015-4-11", "2015-4-13",
"2015-4-17", "2015-5-1"), class = "factor")), .Names = c("cust_no",
"inv_no", "date"), row.names = c(NA, -6L), class = "data.frame")
}