I have a R data frame with 3 columns
- timestamp
- Category
- Value
I am trying to find an elegant way(ideally) to find where the values increased, or decreased, by X percent within a specified time frame. For example, I'd like to know all points in the data where Value increased by 50% or more within 1 week.
Are there any built in funcitons of packages where I can just pass a percentage and a number of days and have it return which rows in the data frame are a match?
something along these lines(pseudo code below):
RowsThatareAMatch <- findmatches(date=MyDF$Timestamp, grouping=MyDF$Category, data=MyDF$Value, growth=0.5, range=7)
The thing that is throwing me off is that I want it returning the rows for each Category that has values, and not just look at every value in the data frame. So if Category A & B had growth of 50% or more within 7 days 8 times in my data, I want those rows returned, and if categories C, D, & E didn't every have that kind of increase I don't want data from those categories returned at all.
Right now I am looking at systematically splitting the data frame into multiple data frames for each category and then doing the analysis on each individual data frame. While that approach could work, something is telling me that R has an easier way to do this.
Thoughts?
edit: Ideally what I am looking for returned is a data frame with 3 columns, and 1 row for every match in my data.
- Category
- Start timestamp of the match
- End timestamp of the match.
Based in my experience with R I would need to identify the row numbers for each grouping and then I could extract the above data from the original data frame, but if there's any good way to go straight to the above output that would be awesome too!
Sample Data
So I have a CSV like this:
Timestamp,Category,Value
2015-01-01,A,1
2015-01-02,A,1.2
2015-01-03,A,1.3
2015-01-04,A,8
2015-01-05,A,8.2
2015-01-06,A,9
2015-01-07,A,9.2
2015-01-08,A,10
2015-01-09,A,11
2015-01-01,B,12
2015-01-02,B,12.75
2015-01-03,B,15
2015-01-04,B,60
2015-01-05,B,62.1
2015-01-06,B,63
2015-01-07,B,12.3
2015-01-08,B,10
2015-01-09,B,11
2015-01-01,C,100
2015-01-02,C,100000
2015-01-03,C,200
2015-01-04,C,350
2015-01-05,C,780
2015-01-06,C,780.2
2015-01-07,C,790
2015-01-08,C,790.3
2015-01-09,C,791
2015-01-01,D,0.5
2015-01-02,D,0.8
2015-01-03,D,0.83
2015-01-04,D,2
2015-01-05,D,0.01
2015-01-06,D,0.03
2015-01-07,D,0.99
2015-01-08,D,1.23
2015-01-09,D,5
I would read that into R like this
df <- read.csv("CategoryMeasurements.csv", header=TRUE)