-2

I have the following data frame, listing the spends for each category for each day

Dataframe: actualSpends
Date         Category  Spend ($) 
2017/01/01   Apple     10
2017/01/02   Apple     12
2017/01/03   Apple     8 
2017/01/01   Banana    13
2017/01/02   Banana    15
2017/01/03   Banana    7

I want to create a new data frame that will list down the average amount spend for each category, for each day of the month. (e.g. On the 3rd of the month, the average spend of all days that have passed in the month, from the 1st to 31st of each month. )

EDIT: So the output should look something like..

Date         Category  AvgSpend ($) 
2017/01/01   Apple     10
2017/01/02   Apple     11
2017/01/03   Apple     10 
2017/01/01   Banana    13
2017/01/02   Banana    14
2017/01/03   Banana    11.7

Where for each category, the average spend for each day is an average of all the days past. 1st, is an average of 1st. 2nd is an average of 1st + 2nd. 3rd is an average of 1st + 2nd + 3rd.

Is there any workaround for this?

www
  • 4,124
  • 1
  • 11
  • 22
  • Possible duplicate of [Mean per group in a data.frame](https://stackoverflow.com/questions/21982987/mean-per-group-in-a-data-frame) – Ronak Shah Aug 31 '17 at 03:42
  • Now that I know you're looking for cumulative means, I've redone my answer to concisely address that. Please accept the answer if this solves your question so that people with the same question in the future can find their answer even faster. – www Sep 01 '17 at 05:06

3 Answers3

0

We can use the cummean function from the dplyr package to calculate cumulative averages for each category; then melt the results into a new column:

library(dplyr)
library(reshape2)

unq <- unique(df$Category)

df$AvgSpend <- melt(
  sapply(1:length(unq),
     function(i) cummean(df$Spending[which(df$Category==unq[i])])))$value

Output:

        Date Category Spending AvgSpend
1 2017/01/01    Apple       10 10.00000
2 2017/01/02    Apple       12 11.00000
3 2017/01/03    Apple        8 10.00000
4 2017/01/01   Banana       13 13.00000
5 2017/01/02   Banana       15 14.00000
6 2017/01/03   Banana        7 11.66667

Sample data:

df <- data.frame(Date=c("2017/01/01","2017/01/02","2017/01/03",
                        "2017/01/01","2017/01/02","2017/01/03"),
                 Category=c("Apple","Apple","Apple",
                            "Banana","Banana","Banana"),
                 Spending=c(10,12,8,13,15,7))
www
  • 4,124
  • 1
  • 11
  • 22
  • i think this function does a average(group by) which isn't what i'm looking for. More like, for the 3rd, average of daily spends of the 1st to 3rd of that month. – sharkiecodes Aug 31 '17 at 06:32
  • It sounds like what you're looking for is a cumulative mean. I've redone my answer to give you that instead. – www Sep 01 '17 at 05:07
0

Here is a tidyverse option

library(tidyverse)
df %>%
  group_by(Date, Category) %>% 
  summarise(Spending = mean(Spending, na.rm = TRUE))
# A tibble: 4 x 3
# Groups:   Date [?]
#        Date Category Spending
#      <fctr>   <fctr>    <dbl>
#1 2017/01/01    Apple       11
#2 2017/01/02   Banana       14
#3 2017/01/03    Apple        8
#4 2017/01/03   Banana        7
akrun
  • 874,273
  • 37
  • 540
  • 662
0

You can use 'sqldf' (https://cran.r-project.org/web/packages/sqldf/sqldf.pdf) package

install.packages("sqldf")

library(sqldf)

actualSpends <- data.frame( Date = c('2017/01/01','2017/01/02', '2017/01/03','2017/01/01','2017/01/02','2017/01/03'), Category =('Apple','Apple','Apple','Banana','Banana','Banana'), Spend = c(10,12,8,13,15,7))

sqldf("select Date,Category,sum(Spend) from actualSpends group by Date,Category ")

Balaji
  • 112
  • 2