I have a dataframe df1:
df1 <- data.frame(
Lot = c("13VC011","13VC018","13VC011A","13VC011B","13VC018A","13VC018C","13VC018B"),
Date = c("2013-07-12","2013-07-11","2013-07-13","2013-07-14","2013-07-16","2013-07-18","2013-07-19"),
Step = c("A","A","B","B","C","C","C"),
kg = c(31,32,14,16,10,11,10))
Sometimes at a particular 'Step' a 'Lot' gets split into A,B or C as indicated. I'd like to sum those and get a dataframe that tells me the total kg at each step, for each lot.
For example the output should look like this:
df2 <- data.frame(
Lot = c("13VC011","13VC011","13VC018","13VC018"),
Step = c("A","B","A","C"),
kg = c(31,30,32,31))
So there are two requirements. If the 'Lot' matches, regardless of the trailing letter, and the step matches, then the sum occurs. If both conditions are not satisfied, then just carry over the line item as is into df2.
Part2: So I would like to introduce a 3rd requirement. In some cases, the Lot was split in two or 3 parts, however not all the data is present. In this case, using these solutions masks this and makes it appear that one lot has much lower kg than it actually has.
What I would like to do is find a way to indicate if the dataset contains 13VC011A for example, but no 13VC011B. Or if we see a 'B' but no 'A' or a 'C' but no 'B' or 'A'.
So now the original dataframe is:
df1 <- data.frame(
Lot = c("13VC011","13VC018","13VC011A","13VC011B","13VC018A","13VC018C","13VC018B","13VC020B"),
Date = c("2013-07-12","2013-07-11","2013-07-13","2013-07-14","2013-07-16","2013-07-18","2013-07-19","2013-07-22"),
Step = c("A","A","B","B","C","C","C","B"),
kg = c(31,32,14,16,10,11,10,18))
And the resultant df2 should look something like:
df2 <- data.frame(
Lot = c("13VC011","13VC011","13VC018","13VC018","13VC020B"),
Step = c("A","B","A","C","B"),
kg = c(31,30,32,31,18),
Partial = c(F,F,F,F,T))