Here is the sample dataframe (Market_Test) Not sure of DataFrame comes out properly or not hence the image
Submarket Market Date URL of test
Seattle_Sub1 Seattle 1/2/2016 Abc.com/test1
Seattle_Sub2 Seattle 1/3/2016 Abc.com/test2
Seattle_Sub1 Seattle 1/5/2016 Abc.com/test3
Portland_Sub1 Portland 1/15/2016 Abc.com/test4
Portland_Sub2 Portland 1/10/2016 Abc.com/test3
Portland_Sub2 Portland 1/11/2016 Abc.com/test2
Portland_Sub1 Portland 1/10/2016 Abc.com/test1
I need to calculate and pivot the data on the SubMarket Level and make it look like the following OutputData Frame Image-
Submarket Market No of Days Testing Gap b/w test Days
Seattle_Sub1 Seattle 2 3
Seattle_Sub2 Seattle 1 0
Portland_Sub1 Portland 2 5
Portland_Sub2 Portland 2 1
There are around 300,000 observations so relatively big data set.
I am able to easily solve the No of Days Testing problem using sqldf and distinct count but hit a roadblock with Gap b/w Test Days calculation.
One thing I tried was using rank the original datframe and then calcualte the delta between 2 values of rank but in ddply but that takes very long, infact it never gets completed.
Inputdf <- mutate(Inputdf,Date_Rank = rank(Date))
Otheroption would have been to use lag/lead functions in sql but sqldf does not allow that.
Thanks much in advance.