-1
NO. Month  Year  Km   Diff
1    1     2010  200   200
1    1     2010  200   200
1    1     2010  200   200
1    2     2010  400   200
1    2     2010  400   200 
1    2     2010  400   200
1    3     2010  800   400
1    4     2010  1500  700

So, I want to take difference subject to Month and Year conditions having same serial No. column using dplyr R package. So, How to give multiple conditions subject to Serial No. and make cumulative difference?

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Mahesh
  • 47
  • 5
  • Can you clarify what it is you're looking for? What is the expected output? Also, can you refer to the columns as you have written them in your data table? E.g. by "Serial No." are you referring to `NO.`? – bouncyball Dec 27 '16 at 18:29
  • 3
    See [here](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for a good reproducible example criteria in `R`. `dput`ing your data would be a good start. – Nick Criswell Dec 27 '16 at 18:29
  • Not sure I understand the end goal. Are you wanting to add up the `Km` and `Diff` columns, aggregated by `NO.`, `Month` and `Year` and then compute the difference? If so, you'd do this. `library(dplyr) ; df2 <- df %>% group_by(`NO.`, Month, Year) %>% summarise(tot_km = sum(Km), tot_diff = sum(Diff)) %>% mutate(agg_diff = tot_km - tot_diff)`. – Nick Criswell Dec 27 '16 at 18:53
  • In this I want the expected output shown in column Diff as it is difference in km for the vehicle no. 1 subject to month and year constraint. If specific vehicle is not running then I will take same km reading as existing in the current month. – Mahesh Dec 28 '16 at 02:16
  • df <- data.frame(No.= c(1,1,1,1,1,1,1,2,2,2), Month=c(rep('A',4),rep('B',6)), Year = c(rep('a',4),rep('b',2),rep('c',4)), Km=c(200,200,200,500,400,1200,800,1200,1200,1600), Diff = c(200,200,200,300,400,800,800,1200,1200,400)) Here is dataframe. The expected output should be Diff column in dataframe which is subject to No,month and year. – Mahesh Dec 28 '16 at 04:11

1 Answers1

0

It turns out the first example wasn't complete enough to handle all the cases. Finally, if I have understood correctly, here is an answer with dplyr and na.locf from zoo package.

We create an additional column Value_Change which would have only the first value of the group and the value when it changes, rest all would be NA and then we fill those NA's with latest non-NA values using na.locf

library(dplyr)
library(zoo)
df %>%
   group_by(No.) %>%
    mutate(Value_Change = c(Km[1], ifelse(diff(Km) == 0, NA, diff(Km))), 
           New_diff = na.locf(Value_Change))


#      No.  Month   Year    Km  Diff Value_Change New_diff
#     (dbl) (fctr) (fctr) (dbl) (dbl)         (dbl)    (dbl)
#1      1      A      a   200   200           200      200
#2      1      A      a   200   200            NA      200
#3      1      A      a   200   200            NA      200
#4      1      A      a   200   200            NA      200
#5      1      B      b   500   300           300      300
#6      1      B      b   500   300            NA      300
#7      1      B      c  1000   500           500      500
#8      2      B      c  1200  1200          1200     1200
#9      2      B      d  1200  1200            NA     1200
#10     2      B      d  1600   400           400      400
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • df <- data.frame(No.= c(1,1,1,1,1,1,1,2,2,2), Month=c(rep('A',4),rep('B',6)), Year = c(rep('a',4),rep('b',2),rep('c',2),rep('d',2)), Km=c(200,200,200,200,500,500,1000,1200,1200,1600), Diff = c(200,200,200,200,300,300,500,1200,1200,400) ) If target variable is No. subject to Month and year. Expected Output like column Diff for km column – Mahesh Dec 28 '16 at 09:28
  • @Mahesh The logic isn't clear to me. Also, if this is a new question then I would suggest you should post it as a separate question. – Ronak Shah Dec 28 '16 at 09:46
  • The question is same- If for a No. 1 vehicle with month and year considered, if vehicle is not running for (month +year) combination then it is same km reading else if anyone (month +year) combination changes then it should be difference between current(month +year) combination with previous (month +year) combination. Same for No. 2 vehicle. The dataframe is correct as mentioned above. – Mahesh Dec 28 '16 at 10:40
  • @Mahesh Hi, I have updated the answer. Please have a look if this is what you want. – Ronak Shah Dec 28 '16 at 17:13