I have two data frames of volume-by-date. They're both the same data, but one is filtered. I'd like to plot a trendline of the ratio between filtered data and non-filtered data on any given day—but am having a hugely hard time molding the data frames so that they're comparable. Here's an example:
unFiltered <- data.frame(date = c("01-01-2015", "01-01-2015", "01-02-2015"), item = c("item1", "item2", "item1"), volume = c(100, 100, 50))
filtered <- data.frame(date = c("01-01-2015", "01-03-2015"), item = c("item1", "item1"), volume = c(10, 40))
From these data sets, I'd like to construct a third data set that is "The percentage of unfiltered item-volume that is being filtered". That is, I want a data frame that will look like this:
date item percentage
1 "01-01-2015" item1 .1
2 "01-01-2015" item2 0
3 "01-02-2015" item1 0
4 "01-02-2015" item2 0
5 "01-03-2015" item1 .8
6 "01-03-2015" item2 0
(Note: Neither data frame has 6 entries—but the resulting data frame has unique values of item
and unique values of date
.)
Anyone have any ideas? I've been stuck on this for ~2 hours, fumbling around with for loops, merging, joins, manually creating data frames, etc. If anyone has a solution, would you mind explaining what's going on in said solution, too? (I still kind of suck at R, and often times I read code that someone writes without having any idea why it actually works).