Difference between rows based on a condition on R

Question

I searched a lot for this answer, but I couldn't find anything that would help me, so here it goes:

I need to change the value on a certain group of rows based on the value from another group. My dataframe is organized like this:

df <- data.frame(group = c("Apple Juice", "Apple Juice", "Apple Juice", "Bottle", "Bottle", "Bottle"), month = c(1,2,3,1,2,3), cost = c(10,10,10,4,4,4))

group	month	cost
Apple Juice	1	10
Apple Juice	2	10
Apple Juice	3	10
Bottle	1	4
Bottle	2	4
Bottle	3	4

In which the cost of the bottle is a part of the apple juice's cost, but I need to take this value off for the second part of the job, while keeping the same structure:

group	month	cost
Apple Juice	1	6
Apple Juice	2	6
Apple Juice	3	6
Bottle	1	4
Bottle	2	4
Bottle	3	4

Usually I do this with dplyr and it used to work:

df <- df %>%
  group_by(group , month) %>%
  dplyr::summarise(cost = -cost[group == 'Bottle'], group = 'Apple Juice') %>%
  ungroup()

But for some reason when I'm running this code I'm getting this result:

group	month	cost
Apple Juice	1	-4
Apple Juice	2	-4
Apple Juice	3	-4

How can I fix this?

score 1 · Accepted Answer · answered Sep 13 '22 at 16:38

1

We could only group by 'month' and get the difference

library(dplyr)
df %>%
   group_by(month) %>%
   mutate(cost = replace(cost, group != 'Bottle', diff(cost[2:1]))) %>% 
   ungroup

-output

# A tibble: 6 × 3
  group       month  cost
  <chr>       <dbl> <dbl>
1 Apple Juice     1     6
2 Apple Juice     2     6
3 Apple Juice     3     6
4 Bottle          1     4
5 Bottle          2     4
6 Bottle          3     4

answered Sep 13 '22 at 16:38

akrun

874,273
37
540
662

1

This worked wonderfully to the dataframe I had, thank you very much! I'm having some trouble trying to adapt to other dataframes with more groups and columns, but the other code I was using is working for those cases. – Drahal Sep 13 '22 at 18:42
1

@Drahal I was using `diff` as I find only two groups. If you are using a default group as 'Bottle', then instead of `diff` can subset the `cost` where `group == 'Bottle'` and use `-` (depends on the number of groups) – akrun Sep 13 '22 at 18:43

score 1 · Answer 2 · answered Sep 13 '22 at 16:44

The solution from akrun is a little more elegant but throwing my hat in the ring.

  df%>%
  pivot_wider(id_col = month,
              names_from = group,
              values_from = cost)%>%
  mutate(`Apple Juice` = `Apple Juice` - Bottle)%>%
  gather(key = 'group',
         value = 'cost',
         -c("month")) -> df_transformed

Difference between rows based on a condition on R

2 Answers2

Linked