0

Okay, I have the following data frame with thousands of rows, Output of the dataframe is given below. This data frame records the orders on an e-commerce website, It lists the products purchased for each order id

     | order_id| product_id|product_name                     |
     |--------:|----------:|:--------------------------------|
     |  1187899|        196|Soda                             |
     |  1187899|      25133|Organic String Cheese            |
     |  1187899|      38928|0% Greek Strained Yogurt         |
     |  1187899|      26405|XL Pick-A-Size Paper Towel Rolls |
     |  1187899|      39657|Milk Chocolate Almonds           |
     |  1187899|      10258|Pistachios                       |
     |  1187899|      13032|Cinnamon Toast Crunch            |
     |  1187899|      26088|Aged White Cheddar Popcorn       |
     |  1187899|      27845|Organic Whole Milk               |
     |  1187899|      49235|Organic Half & Half              |
     |  1187899|      46149|Zero Calorie Cola                |
     |  1492625|      22963|Organic Roasted Turkey Breast    |
     |  1492625|       7963|Gluten Free Whole Grain Bread    |
     |  1492625|      16589|Plantain Chips                   |
     |  1492625|      32792|Chipotle Beef & Pork Realstick   |

The code used to list above data frame is:

 temp <- orders  %>%
  inner_join(opt,by="order_id") %>%
  inner_join(products,by="product_id") %>%
  select(order_id,product_id,product_name)
  kable(head(temp,15))

I want to count the most ordered products, basically, my output should be something like this:

     product_id | Order_Count
        196         10025
        7963        9025
        25133       8903

I cannot fig out how to go about this, I've tried following:

      mutate(prods = count(product_id))

But it did not work i got a error saying: Error in mutate_impl(.data, dots) : Evaluation error: no applicable method for 'groups' applied to an object of class "factor".

Any help will be appreciated!

  • 1
    `table(temp$product_id)`? – Rui Barradas May 13 '18 at 11:02
  • thank you, so simple, it worked, finally i used `sort(table(temp$product_name),decreasing = TRUE)` to sort it in descending order, now figuring out how to use it in ggplot – Gaurang Swarge May 13 '18 at 11:36
  • 1
    As for package `ggplot2` I suggest you ask another question. But please post data using `dput(temp)` or, if `temp` is too big, using `dput(head(temp, 30))`. – Rui Barradas May 13 '18 at 11:39
  • Any particular reason you are suggesting to use dput ? – Gaurang Swarge May 13 '18 at 12:15
  • @GaurangSwarge `dput(temp)` will details of your data.frame in a format that makes easier for other guys recreate and provide solution. Otherwise, who will do loads of unnecessary typing to help you. – MKR May 13 '18 at 13:33

1 Answers1

0

You can use table() to print a simple table (as mentioned by Rui Barradas) or use dplyr::count() if you want a data frame with the count.

library(tidyverse)

orders <- tibble::tribble(
  ~order_id, ~product_id, ~product_name,
  "1187899", "196", "Soda",
  "1187899", "25133", "Organic String Cheese",
  "1187899", "38928", "0% Greek Strained Yogurt",
  "1187899", "26405", "XL Pick-A-Size Paper Towel Rolls",
  "1187899", "39657", "Milk Chocolate Almonds",
  "1187899", "10258", "Pistachios",
  "1187899", "10258", "Pistachios",
  "1187899", "10258", "Pistachios",
  "1187899", "13032", "Cinnamon Toast Crunch",
  "1187899", "13032", "Cinnamon Toast Crunch",
  "1187899", "26088", "Aged White Cheddar Popcorn",
  "1187899", "27845", "Organic Whole Milk",
  "1187899", "49235", "Organic Half & Half",
  "1187899", "46149", "Zero Calorie Cola",
  "1492625", "22963", "Organic Roasted Turkey Breast",
  "1492625", "7963", "Gluten Free Whole Grain Bread",
  "1492625", "16589", "Plantain Chips",
  "1492625", "32792", "Chipotle Beef & Pork Realstick"
)

A simple printed table with (e.g.) the per product_id count

table(orders$product_id)

But if you want a data frame with the count, to plot, or use for whatever, then

orders %>%
  count(product_id, product_name)

> + # A tibble: 15 x 3
>    product_id product_name                         n
>    <chr>      <chr>                            <int>
>  1 10258      Pistachios                           3
>  2 13032      Cinnamon Toast Crunch                2
>  3 16589      Plantain Chips                       1
>  4 196        Soda                                 1
>  5 22963      Organic Roasted Turkey Breast        1
>  6 25133      Organic String Cheese                1
>  7 26088      Aged White Cheddar Popcorn           1
>  8 26405      XL Pick-A-Size Paper Towel Rolls     1
>  9 27845      Organic Whole Milk                   1
> 10 32792      Chipotle Beef & Pork Realstick       1
> 11 38928      0% Greek Strained Yogurt             1
> 12 39657      Milk Chocolate Almonds               1
> 13 46149      Zero Calorie Cola                    1
> 14 49235      Organic Half & Half                  1
> 15 7963       Gluten Free Whole Grain Bread        1