My source data has data corresponding to a few months, but of those, I only want to compare data from pre-specified months.
Here's my input data:
dput(mydf)
structure(list(Month = structure(c(1L, 2L, 1L, 2L, 3L, 1L, 2L,
2L, 1L, 2L, 1L), .Label = c("Aug", "Oct", "Sep"), class = "factor"),
Pipe = c(3, 4, 5, 3, 2, 1, 3, 3, 4, NA, 5), Gp = structure(c(1L,
1L, 2L, 2L, 2L, 3L, 4L, 5L, 5L, 6L, 6L), .Label = c("A",
"B", "C", "D", "E", "F"), class = "factor")), .Names = c("Month",
"Pipe", "Gp"), row.names = c(NA, -11L), class = "data.frame")
Now, of the three months, I only want to compare the months specified by below variables.
This_month_to_compare <- "Oct"
Last_Month_to_compare <- "Aug"
Now, for given two months
above and based on groupings Gp
, I want to indicate whether Pipe
value in This_month_to_compare
is greater than that in Last month to compare
. If one of the two pipe
values don't exist, we will leave it blank.
Here's what the output looks like (manually created because I wasn't successful with the code)
structure(list(Month = structure(c(1L, 2L, 1L, 2L, 3L, 1L, 2L,
2L, 1L, 2L, 1L), .Label = c("Aug", "Oct", "Sep"), class = "factor"),
Pipe = c(3, 4, 5, 3, 2, 1, 3, 3, 4, NA, 5), Gp = structure(c(1L,
1L, 2L, 2L, 2L, 3L, 4L, 5L, 5L, 6L, 6L), .Label = c("A",
"B", "C", "D", "E", "F"), class = "factor"), Greater = c(NA,
TRUE, NA, FALSE, NA, NA, NA, FALSE, NA, NA, NA)), .Names = c("Month",
"Pipe", "Gp", "Greater"), row.names = c(NA, -11L), class = "data.frame")
Month Pipe Gp Greater Explanation
Aug 3 A Ignore: Aug
Oct 4 A TRUE 4 > 3
Aug 5 B Ignore: Aug
Oct 3 B FALSE 3< 5
Sep 2 B Ignore: Sep
Aug 1 C Ignore: Aug
Oct 3 D There is nothing to compare with
Oct 3 E FALSE 3<4
Aug 4 E Ignore: Aug
Oct F Cannot compare NA with 5
Aug 5 F Ignore: Aug
I have manually added explanation above.
I did try to code, and here's my attempt:
mydfi<-data.table::as.data.table(mydfi)
mydf<-mydfi
#Method 1: Convert to Wide Format
#Convert to wide format
mydf<-data.table::dcast(mydf,Gp ~ Month, value.var = "Pipe")
#Compare
mydf$Growth<-mydf[[This_month_to_compare]]>mydf[[Last_Month_to_compare]]
#Back to long format
Melt_columns<-c("Aug","Oct","Sep")
mydf<-data.table::melt(mydf, measure.vars =Melt_columns,variable.name = "Month", value.name = "Pipe")
mydfo<-mydf[mydfi,on=c("Month","Gp","Pipe")]
mydfo[Month!=This_month_to_compare,"Growth"]<-NA
An update: I was able to solve above problem by just adding a left join. I have updated my code above. However, I am looking for a solution on these lines: Calculate difference between values in consecutive rows by group
The reason is that my actual data set is very big and doesn't permit joins.
Any help would be greatly appreciated. Thanks in advance.