0

I want to add another column to my dataframe in R, containing the sum of the amount of orders, when the other columns are the same.

For example if this is my data frame:

orderdate / orderhour / deliverydate / deliveryhour / orders / New column: sum


wednesday / 12:00 / Friday / 17:00 / 5 / 0
wednesday / 12:00 / Friday / 17:00 / 2 / 7
thursday / 13:00 / Friday / 17:00 / 3 / 3
thursday/ 13:00 / Friday / 19:00 / 1 / 1

The last column is the one I want to add. In the end, the rows with zero can be removed, so I no longer have duplicate data. So only if all columns match (except orders) the sum of the orders should be taken. In the other cases the same value can be kept.

Can someone please help me?

Thanks in advance!

  • Please see https://stackoverflow.com/a/5963610/7654451 to make a reproducible example. – Tito Sanz Jan 03 '18 at 10:47
  • Why does the first row have a 0 sum and not a 5? What happens if you have multiple rows? Do you want 0s in all rows apart from the row that has the total sum? Can you post an example where a specific group has 3+ rows and the ideal output? – AntoniosK Jan 03 '18 at 11:55
  • Are you following the discussion in the comments to @AntoniosK answer, Lara? – Rui Barradas Jan 03 '18 at 14:50

1 Answers1

1
# example data
df = read.table(text = "
orderdate/orderhour/deliverydate/deliveryhour/orders
wednesday/12:00/Friday/17:00/5 
wednesday/12:00/Friday/17:00/2 
thursday/13:00/Friday/17:00/3 
thursday/13:00/Friday/19:00/1
", sep="/", header=T, stringsAsFactors=F)

library(dplyr)

df %>%
  group_by(orderdate, orderhour, deliverydate, deliveryhour) %>% # for each combination of those variables
  mutate(sum_orders = ifelse(n() > 1 &                           # if there are multiple rows
                             row_number() == 1,                  # and this is row 1
                             0,                                  # use a zero
                             cumsum(orders))) %>%                # else use the cum sum
  ungroup()                                                      # forget the grouping

# # A tibble: 4 x 6
#   orderdate orderhour deliverydate deliveryhour orders sum_orders
#   <chr>     <chr>     <chr>        <chr>         <dbl>      <dbl>
# 1 wednesday 12:00     Friday       17:00          5.00       0   
# 2 wednesday 12:00     Friday       17:00          2.00       7.00
# 3 thursday  13:00     Friday       17:00          3.00       3.00
# 4 thursday  13:00     Friday       19:00          1.00       1.00
AntoniosK
  • 15,991
  • 2
  • 19
  • 32
  • The result is right but it deletes the column `orders`, it should add a new column. – Rui Barradas Jan 03 '18 at 11:45
  • @RuiBarradas I was just checking that! :) I'm a bit confused with the zeros in multiple rows within a group. Do they need 0 only in the first row? (So I can use a `cumsum` approach). Or they need 0s in every rows apart from the last row, that has the total sum for that group? – AntoniosK Jan 03 '18 at 11:54
  • The OP says that "in the end, the rows with zero can be removed", so I believe that they don't need 0 at all. What they should need is the column that vanished, the last one before the sum. – Rui Barradas Jan 03 '18 at 14:30
  • @RuiBarradas I agree on that. My question is how the OP uses 0s when a specific group has 3 or more rows. In the above example groups have 1 or 2 rows. If the group `wednesday / 12:00 / Friday / 17:00` had 3 rows with `orders` `5,1,1` do we expect the sums to be `0,6,7` or `0,0,7`? – AntoniosK Jan 03 '18 at 14:38
  • For what I understand, it should be `0,6,7` and therefore the `cumsum` approach, like you said earlier, would be the way to do it. But the question is not clear. – Rui Barradas Jan 03 '18 at 14:45
  • Not sure if it's better to delete my answer until the question becomes clear, or to add multiple approaches :) – AntoniosK Jan 03 '18 at 14:48
  • @RuiBarradas I've updated my answer because it's a FACT that they need the new column added to the original one and my previous answer didn't cover that. I can easily change/treat the added 0s within the `ifelse` if needed. – AntoniosK Jan 03 '18 at 15:06