0

I have the following data table :

structure(list(Date = c("2015-04-01", "2015-04-01", "2015-04-01", "2015-04-01", "2015-04-01", "2015-04-01"), Category = structure(c(4L, 4L, 4L, 5L, 5L, 6L), .Label = c("Bakery ", "Branded goods", "Breakfast ", "Canned/Packaged ", "Cooking essentials ", "Household ", "NO CATEGORY", "Personal care", "Stationary ", "Vehicle accessories"), class = "factor"), Sub_Category = c("carbonated drink ", "carbonated drink ", "carbonated drink ", "Dairy ", "Dairy ", "Stationary "), Product = c("soft drink", "soft drink", "soft drink", "Butter ", "Butter ", "A4 paper"), Brand = c("7 up ", "7 up ", "7 up ", "Amul", "Amul", "NO BRAND"), Day = c(1L, 1L, 1L, 1L, 1L, 1L), Month = c(4L, 4L, 4L, 4L, 4L, 4L), Year = c(2015L, 2015L, 2015L, 2015L, 2015L, 2015L), MRP = c("55", "25", "70", "37", "37", "0.5"), Quantity = c(1, 1, 1, 1, 1, 20), Sales = c(55, 25, 70, 37, 37, 10), Wday = c("Wednesday", "Wednesday", "Wednesday", "Wednesday", "Wednesday", "Wednesday"), Week = c(13L, 13L, 13L, 13L, 13L, 13L), X = c(NA, NA, NA, NA, NA, NA), X. = c(NA, NA, NA, NA, NA, NA)), .Names = c("Date", "Category", "Sub_Category", "Product", "Brand", "Day", "Month", "Year", "MRP", "Quantity", "Sales", "Wday", "Week", "X", "X."), sorted = "Date", class = c("data.table", "data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x00000000001b0788>)

and I wish to show the sum of(Quantity) or sum(sales) for each type of category for each day spanning the entire date column.

I have tried :

data2 <- data %>% group_by(data$Date) %>% summarise_each(funs(sum))

but i get:

Error in is_list(x) : object 'rlang_is_list' not found

Also tried :

aggregate(cbind(data$Category,data$Sales,data$Quantity)~data$Date,
               data=data,FUN=sum)

this produces an entirely different output. ;/

Is there a way to get this done ?

Even after uninstalling rlang,dplyr and ggplot2 package, the error still remains. Is there a way around this ?

Expected output is like so : enter image description here

Thanks in advance

9minus4
  • 53
  • 7
  • 2
    may not be your problem, but don't use the $ in group_by. Just `group_by(Date)` – Richard Telford Jun 01 '18 at 05:04
  • 1
    We don't have a reproducble example here, but why use data.table if you are doing everything in dplyr? Or why use dplyr if you have a data.table? Either-way, please provide `dput(head(data))` – David Arenburg Jun 01 '18 at 05:34
  • @DavidArenburg Please check again, as I have updated the question with the dput(head). I had been trying with dplyr, but the "rlang_is" issue keeps hindering it. So i figured i try with data table, using setkey [ ] and then lapply. But it didn't seem to produce any useful output either. – 9minus4 Jun 01 '18 at 05:46
  • Cannot reproduce the issue. Could you make a reproducible example with package [`reprex`](https://github.com/tidyverse/reprex)? – mt1022 Jun 01 '18 at 06:55
  • 1
    @9minus4: restart R, then reinstall `rlang` & `dplyr` – Tung Jun 01 '18 at 07:00

2 Answers2

3

This is a much simpler way using data.table and reshape2 on the data you provided

library(reshape2)
library(data.table)
setDT(data)
data2<-dcast(data[,.(Quantity=sum(Quantity),Sales=sum(Sales)),by=.(Category,Date)],Date~Category,value.var = c("Quantity","Sales"),fun.aggregate = sum)

You have both the sales and quantity

SatZ
  • 430
  • 5
  • 14
  • You can use the snippet below to remove all packages, https://stackoverflow.com/questions/7505547/detach-all-packages-while-working-in-r – SatZ Jun 01 '18 at 09:57
  • This is much simpler indeed! Thank you @SatZ. Very useful – 9minus4 Jun 01 '18 at 10:06
1

You can use summarise_at for Sales & Quantity only

# calculate total sales
library(dplyr)
data2 <- data %>% 
  group_by(Date, Category) %>% 
  summarise_at(vars(Sales, Quantity), sum, na.rm = TRUE) %>% 
  mutate(Total = Sales * Quantity)
data2

#> # A tibble: 10 x 5
#> # Groups:   Date [1]
#>    Date       Category              Sales Quantity Total
#>    <chr>      <fct>                 <dbl>    <dbl> <dbl>
#>  1 2015-04-01 "Bakery "                 0        0     0
#>  2 2015-04-01 Branded goods             0        0     0
#>  3 2015-04-01 "Breakfast "              0        0     0
#>  4 2015-04-01 "Canned/Packaged "      150        3   450
#>  5 2015-04-01 "Cooking essentials "    74        2   148
#>  6 2015-04-01 "Household "             10       20   200
#>  7 2015-04-01 NO CATEGORY               0        0     0
#>  8 2015-04-01 Personal care             0        0     0
#>  9 2015-04-01 "Stationary "             0        0     0
#> 10 2015-04-01 Vehicle accessories       0        0    NA

Then convert from long to wide table using tidyr::spread

library(tidyr)
data2_long <- data2 %>% 
  select(-Quantity, -Sales) %>% 
  spread(Category, Total)
data2_long

# A tibble: 1 x 11
# Groups:   Date [1]
  Date       `Bakery ` `Branded goods` `Breakfast ` `Canned/Packaged ` `Cooking essentials ` `Household ` `NO CATEGORY`
  <chr>          <dbl>           <dbl>        <dbl>              <dbl>                 <dbl>        <dbl>         <dbl>
1 2015-04-01         0               0            0                450                   148          200             0
# ... with 3 more variables: `Personal care` <dbl>, `Stationary ` <dbl>, `Vehicle accessories` <dbl>

Data used:

data <- structure(list(Date = c("2015-04-01", "2015-04-01", "2015-04-01", 
    "2015-04-01", "2015-04-01", "2015-04-01"), Category = structure(c(4L, 
    4L, 4L, 5L, 5L, 6L), .Label = c("Bakery ", "Branded goods", "Breakfast ", 
    "Canned/Packaged ", "Cooking essentials ", "Household ", "NO CATEGORY", 
    "Personal care", "Stationary ", "Vehicle accessories"), class = "factor"), 
    Sub_Category = c("carbonated drink ", "carbonated drink ", 
    "carbonated drink ", "Dairy ", "Dairy ", "Stationary "), 
    Product = c("soft drink", "soft drink", "soft drink", "Butter ", 
    "Butter ", "A4 paper"), Brand = c("7 up ", "7 up ", "7 up ", 
    "Amul", "Amul", "NO BRAND"), Day = c(1L, 1L, 1L, 1L, 1L, 
    1L), Month = c(4L, 4L, 4L, 4L, 4L, 4L), Year = c(2015L, 2015L, 
    2015L, 2015L, 2015L, 2015L), MRP = c("55", "25", "70", "37", 
    "37", "0.5"), Quantity = c(1, 1, 1, 1, 1, 20), Sales = c(55, 
    25, 70, 37, 37, 10), Wday = c("Wednesday", "Wednesday", "Wednesday", 
    "Wednesday", "Wednesday", "Wednesday"), Week = c(13L, 13L, 
    13L, 13L, 13L, 13L), X = c(NA, NA, NA, NA, NA, NA), X. = c(NA, 
    NA, NA, NA, NA, NA)), .Names = c("Date", "Category", "Sub_Category", 
    "Product", "Brand", "Day", "Month", "Year", "MRP", "Quantity", 
    "Sales", "Wday", "Week", "X", "X."), sorted = "Date", 
    class = c("data.frame"), 
    row.names = c(NA, -6L))

P.S: to solve the object 'rlang_is_list' not found problem. Follow the solution in this answer

Try launching R from a terminal / console session

R --vanilla

Then, uninstall and reinstall of rlang with something like:

remove.packages("rlang")
install.packages("rlang")
Tung
  • 26,371
  • 7
  • 91
  • 115
  • hi @Tung Thank you for this. It is indeed helpful. It still throws up the same error : "rlang_list is not found".............. Moreover how can I get this sum for each category for each day. ie 2015-04-01 would have 10 columns beside it for each category and inside is the total of sales made that day for that item category and 2015-04-02 would be for the next .. row wise so as to speak. – 9minus4 Jun 01 '18 at 06:43
  • @9minus4: please post your expected output. manually calculate 2 or 3 rows should do it – Tung Jun 01 '18 at 06:45
  • Please check now, I have added the expect o/p in the question – 9minus4 Jun 01 '18 at 06:47
  • @9minus4: your expected output doesn't match with your data. You need to explain how you come up with that output in the question as well – Tung Jun 01 '18 at 06:52
  • 1
    Thank you for your answer! The final solution worked. – 9minus4 Jun 01 '18 at 08:38