-1
> Total
                          Product Quantity Price Total
 1:                  tomatoes 1kg        1    16    16
 2:                small cucumber        1    10    10
 3:                  beetroot 1kg        1    15    15
 4:                 potatoes 1 kg        1    15    15
 5:                asparagus 200g        4    45   180
 6:           red apples 4 medium        1    10    10
 7:       beef fillet strips 500g        1    90    90
 8:               back bacon 200g        1    30    30
 9: chicken drums and thighs 1 kg        1    75    75
10:             kudu biltong 250g        2    80   160
11:                   t bone 500g        1    66    66
12:             free range eggs 6        1    15    15
13:                  tomatoes 1kg        1    16    16
14:  calistos jalape=c3=b1o salsa        1    40    40
15:               lean beef mince        1    54    54
16:            free range eggs 30        1    65    65
17:                 potatoes 1 kg        1    15    15
18:        strawberry punnet 250g        1    22    22
19:          chicken whole 1.4 kg        1    65    65
20:                small cucumber        4    10    40
21:                   swiss chard        2    14    28
22:                  tomatoes 1kg        3    16    48
23:                   carrot 1 kg        2    14    28
24:                          kale        2    14    28
25:               butternut cubes        2    14    28
26:                 potatoes 1 kg        2    15    30
27:                   onions 1 kg        1    15    15
28:  oyster mushrooms 200g punnet        1    35    35
29:        strawberry punnet 250g        2    22    44
30:            free range eggs 30        1    65    65
31:                small cucumber        2    10    20
32:                  tomatoes 1kg        1    16    16
33:                 broccoli head        1    25    25
34:        cauliflower whole head        2    25    50
35:                   carrot 1 kg        2    14    28
36:               butternut cubes        2    14    28
37:                          kale        2    14    28
38:                   butter 500g        1    57    57
39:  oyster mushrooms 200g punnet        2    35    70
40:                      coleslaw        1    15    15
                          Product Quantity Price Total

I have this data frame with a lot of duplicates and i've tried different methods of collapsing them together but it always removes the price and total columns.

aggregate(Quantity~Product,data=Total,FUN=sum)

Gets me:

> Total
                         Product Quantity
1                 asparagus 200g        4
2                back bacon 200g        1
3        beef fillet strips 500g        1
4                   beetroot 1kg        1
5                  broccoli head        1
6                    butter 500g        1
7                butternut cubes        4
8   calistos jalape=c3=b1o salsa        1
9                    carrot 1 kg        4
10        cauliflower whole head        2
11 chicken drums and thighs 1 kg        1
12          chicken whole 1.4 kg        1
13                      coleslaw        1
14            free range eggs 30        2
15             free range eggs 6        1
16                          kale        4
17             kudu biltong 250g        2
18               lean beef mince        1
19                   onions 1 kg        1
20  oyster mushrooms 200g punnet        3
21                 potatoes 1 kg        4
22           red apples 4 medium        1
23                small cucumber        7
24        strawberry punnet 250g        3
25                   swiss chard        2
26                   t bone 500g        1
27                  tomatoes 1kg        6

This does collapse it but it removes the other columns.

Total %>% group_by(Product) %>%  summarise(quantity = sum(Quantity))

Does the same thing.

Expected output should have the Price and Total with all the Product's merged.

link to the dataframe

Noodle18
  • 5
  • 4

4 Answers4

0

Try this:

df = data.frame(product = c("tomatoes 1kg", "tomatoes 1kg", "small cucumber"),
                quantity = c(1, 1, 1),
                price = c(16, 16, 10),
                total = c(16, 16, 10))
df[-duplicated(df),]
Ventrilocus
  • 1,408
  • 3
  • 13
  • This isnt gonna work i currently have most of the code running off of text so it comes in as raw text and i convert it into a data frame so id have to do this for over 100 products manually like this and id like to avoid that – Noodle18 Nov 12 '20 at 18:35
  • 1
    df is just an example dataset. It is good practice to provide a dataset when asking a question. I created one for you. Please check if the answer is valid on your large dataset – Ventrilocus Nov 12 '20 at 18:37
  • `Error: duplicated() applies only to vectors` – Noodle18 Nov 12 '20 at 19:04
0

summarise will only keep the grouping columns and the ones you calculate with the aggregate functions.

If a given Product always has the same Price, you could do:

Total %>% group_by(Product, Price) %>% summarise(Quantity = sum(Quantity), Total = sum(Total))

[By the way, I personally wouldn't recommend naming your data frame "Total" if that's already the name of a column -- it's a bit confusing.]

If a given Product doesn't always have the same Price (for example, if the data comes from multiple dates, and prices could change over time), then you have to decide whether (A) you still want one row per Product, or (B) you want one row for every distinct (Product, Price) pair. If you choose, (A), then you have to choose which Price you want to include for each Product. The first Price listed for that Product? The minimum Price? The maximum Price? The average Price? etc.

This explains why summarise drops the columns that you don't include in your group_by or aggregate functions: It has know way of knowing which of the above options you would choose, unless you tell it.

If you want to keep the average price for each product, you could do this:

Total %>% group_by(Product) %>%
          summarise(Quantity = sum(Quantity),
          AveragePrice = sum(Total)/sum(Quantity),
          Total = sum(Total))

[Here I didn't just do AveragePrice = mean(Price), because that would be the average price per row for that product, whereas what you probably would want is the average price per unit sold.]

Tim Goodman
  • 23,308
  • 7
  • 64
  • 83
  • I should add, I assume in this answer that what you called "duplicates" represent separate observations (i.e., different purchases of the same product). But if they were actually duplicate records for the same observation (such as a purchase that got accidentally recorded twice), you could just eliminate them like this: `Total %>% distinct()` – Tim Goodman Nov 12 '20 at 18:39
  • `Quantity Total 1 61 1685` first one gives me this – Noodle18 Nov 12 '20 at 18:44
  • `Quantity AveragePrice Total 1 61 27.62295 1685` Second one gives me this. – Noodle18 Nov 12 '20 at 18:45
  • Yeah sorry i mean separate observations because these are joined data frames using `rbind` and `map` i couldnt find a way to use `left_join` or `full_join` with map – Noodle18 Nov 12 '20 at 19:00
0

Please familiarize yourself on how to provide a proper reproducible example and what to do when someone answers your question.

Here is a dplyr solution.

df = data.frame(product = c("tomatoes 1kg", "tomatoes 1kg", "small cucumber"),
                quantity = c(1, 1, 1),
                price = c(16, 16, 10),
                total = c(16, 16, 10))

library(dplyr)
df %>% 
  group_by(product, price) %>% 
  summarise(
    quantity = sum(quantity),
    total = sum(total)
  )

#> # A tibble: 2 x 4
#> # Groups:   product [2]
#>   product        price quantity total
#>   <chr>          <dbl>    <dbl> <dbl>
#> 1 small cucumber    10        1    10
#> 2 tomatoes 1kg      16        2    32

Created on 2020-11-12 by the reprex package (v0.3.0)

Eric
  • 2,699
  • 5
  • 17
  • ` quantity total ` `1 61 1685` Same thing as the others i think it might be the way im getting this data frame im not sure but i need all the products listed with prices next to them. – Noodle18 Nov 12 '20 at 18:56
  • Please provide a reproducible example and proper expected output. It is hard for us to know what you are looking for based on your question. – Eric Nov 12 '20 at 19:02
  • Its hard to provide a reproducible example because I've had to text filter and convert variables many times to create the data frame my expected out put is the same as yours but i'm just getting a complete sum of all the columns and not your result. – Noodle18 Nov 12 '20 at 19:10
  • Thats a bad way to say it what i mean is im getting the `quantity` and `total` columns sum but no products are showing – Noodle18 Nov 12 '20 at 19:17
  • Did you use the code provided in my solution? Did you substitute `df` with the name of your dataset? – Eric Nov 12 '20 at 19:20
  • Yes I did it gives me a data frame with the sum of `quantity` and the sum of `total` and there is no individual products. – Noodle18 Nov 12 '20 at 19:21
  • 1
    I need to work with a reproducible example to help you further. To produce a minimal data set, you can use `head()`, `subset()`, or the indices. Then use `dput()` to give us something that can be put in R immediately. – Eric Nov 12 '20 at 19:23
  • Downloadable in the Body of the Question. – Noodle18 Nov 12 '20 at 19:38
0

Use gather to represent the variables in columns as rows. Then use the group and summary functions to get the sum/mean of each group. Again the data can be presented column wise using spread.

Total %>%
  gather(key = variable, value = value, c(Quantity,Price,Total)) %>%
  group_by(Product, variable) %>%
  summarize(sum = sum(value)) %>%
  spread(variable, sum)
JineshEP
  • 738
  • 4
  • 7