I have my data that looks like below: (basically lets say its sales of different brands by customer trip, blank means the brand isnt purchased in that particular trip for a customer,store refers to store location of purchase)
customerid date store brand1 brand2 brand3 brand4
1 01-03-2012 a $3.00 $- $- $2.00
1 06-03-2012 a $2.00 $- $- $3.00
1 11-03-2012 b $2.00 $1.00 $1.00 $1.00
1 26-03-2012 a $2.00 $- $- $-
2 16-03-2012 d $2.00 $1.00 $1.00 $2.00
2 21-03-2012 a $- $- $1.00 $2.00
2 26-03-2012 a $2.00 $1.00 $3.00 $1.00
i would like to create separate dataframes for each brand containing rows only where that brand has sales >0 , so what i thought.. i could put the brand1-brand4 in a list called colnames_df, like below:
colnames_df<- colnames(myDf)
Once i do this, i can loop through the contents of this loop to generate the brand level datasets... From data above, i need 4 separate datasets with the relevant brand column and other columns of custID, date alone.. like 4 datasets below is what i want:
Dataset for brand1: (expected output)
customerid date store brand1
1 01-03-2012 a $3.00
1 06-03-2012 a $2.00
1 11-03-2012 b $2.00
1 26-03-2012 a $2.00
2 16-03-2012 d $2.00
2 26-03-2012 a $2.00
Dataset for brand2: (expected output)
customerid store date brand2
1 b 11-03-2012 $1.00
2 d 16-03-2012 $1.00
2 a 26-03-2012 $1.00
Likewise there will be dataframes for Brand3 and 4 as well...For this part, should i write something like for( i in length(colnames_df) { paste("Brand",i)<-}... Not sure how to write this.. I need to create brand level data frames from original data above...If i use lapply andsuch functions, i was able to figure out how to obtain list/dataframe with all columns in resulting data.. How do i do what i need to do above...
Apart from above, i have one other requirement:
If the brand level datasets are created, i also need to create lag, counter variables like below on each brand level dataset...
- Step1: Create counter variable for each customer trip (after dataset is sorted by custID and Date)...
Expected output for brand1(with counter):
code which i use (im having difficulty putting this code in a loop so that each brand level dataset that is created has the new variable automatically created..instead of brand1 below, it should automatically be brand1,2,3,4 etc)
brand1$counter <- with(brand1, ave(customerID, customerID, FUN = seq_along))
customerid date store brand1 counter_custtrip
1 01-03-2012 a $3.00 1
1 06-03-2012 a $2.00 2
1 11-03-2012 b $2.00 3
1 26-03-2012 a $2.00 4
2 16-03-2012 d $2.00 1
2 26-03-2012 a $2.00 2
2.step2: Create a lag variable....like expected output below..
I can use code like this: (my question is i can do these operations separately per dataset, but how do i do it in such a way that all this happens as each brand level dataset gets created...???)
ddply(.data = df, .variables = .(customerID), mutate,
lagdate = c(NA, head(date, -1))
Expected output is: (for brand1 dataset)
customerid date store brand1 counter_custtrip laggedtripdate
1 01-03-2012 a $3.00 1 -
1 06-03-2012 a $2.00 2 01-03-2012
1 11-03-2012 b $2.00 3 06-03-2012
1 26-03-2012 a $2.00 4 11-03-2012
2 16-03-2012 d $2.00 1 -
2 26-03-2012 a $2.00 2 16-03-2012
- step3: Create days between trips variable by store
See expected output for brand1 (and likewise for all brands)
customerid date store brand1 counter_custtrip laggedtripdate daysbetweentrips
1 01-03-2012 a $3.00 1 - -
1 06-03-2012 a $2.00 2 01-03-2012 5
1 11-03-2012 b $2.00 3 -
1 26-03-2012 a $2.00 4 06-03-2012 20
2 16-03-2012 d $2.00 1 - -
2 26-03-2012 a $2.00 2 16-03-2012 -
As we can see, CustomerID 1 has gone to store a on 3/1, and then 5 days later on 3/6 and then 20 days later on 3/26.. This is the logic.. How do i do this for each customer for each store??...
I know there is a lot, and i am almost there, i just need a couple of lines of advice on how i can put this whole structure together, so that i can put this in a loop with new brand level datasets getting created and each of them having all new variables created during the dataframe creation process....
Let me know what i am missing