-1

For some reason I can't group and sum my data.

  amazon2 <- amazon %>% 
  group_by(amazon.order.id, quantity.shipped) %>%  
  summarize(amazon2, quantity = sum(quantity.shipped, na.rm = TRUE))

glimpse shows this:

Groups: amazon.order.id [388] $ amazon.order.id "204-0311626-3448315", "204-9226726-5233164", "026-2318018-... $ quantity.shipped 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1,...

The result just gives me the cell 491 and nothing else.

camille
  • 16,432
  • 18
  • 38
  • 60
  • Is the amazon orderID format too complex to group? – Matthew Appleyard Oct 02 '19 at 12:14
  • 1
    I think you misunderstand what `group_by` does. It does not summarize anything. Rather, all following functions are applied group-wise. Have a look at the [examples here](https://dplyr.tidyverse.org/reference/group_by.html) – JBGruber Oct 02 '19 at 12:16
  • 1
    Also maybe have a look at how to write a [good question](https://stackoverflow.com/a/5963610/5028841). It's neither entirely clear what you want to do with your data nor how your data looks. – JBGruber Oct 02 '19 at 12:19
  • Another good resource is [this tutorial](https://dplyr.tidyverse.org/articles/dplyr.html#grouped-operations) – JBGruber Oct 02 '19 at 12:22
  • well running this results in just 1 cell with a total figure and I lose all the orderIDs: amazon2 <- as.data.frame(amazon) %>% select(amazon.order.id, quantity.shipped) %>% group_by(amazon.order.id) %>% summarise(quantity = sum(quantity.shipped)) – Matthew Appleyard Oct 02 '19 at 13:02

3 Answers3

0

Read your code, line by line, and then compare to your last line of text.

You do not anywhere specify that you want to sum anything.

Try editing the last line to

group_by(amazon.order.id) %>% summarise(sum(quantity.shipped))

and then go read https://datacarpentry.org/R-genomics/04-dplyr.html#split-apply-combine_data_analysis_and_the_summarize()_function

MrGumble
  • 5,631
  • 1
  • 18
  • 33
0

group_by won't sum the values of the groups. It will create the groups in which you can perform operations, such as summarizing.

You see that the glimpse starts with "Groups: amazon.order.id [388]". It means that there are 388 groups in your set.

Monique Oliveira
  • 21
  • 1
  • 1
  • 4
0
amazon2 <- as.data.frame(amazon) %>% 
  group_by(amazon.order.id) %>% 
  select(amazon.order.id, quantity.shipped) %>%  
  dplyr::summarise(quantity = sum(quantity.shipped))

seems like plyr overides dplyr causing errors. So here's the answer.

  • Try not to use `plyr` at all if possible. `dplyr` is the successor of `plyr` and there shouldn't be any reasons to still use `plyr`. It will only cause you headache. – JBGruber Oct 02 '19 at 15:07