0

I have a dataset of observations by day for several months and need to find the average of the observations for each day. The data is from a tab delimited text file with the following column names: Day, Date, Views, Engagement, Sales. I'm trying to find the average Views, Engage., and Sales for all 7 days of the week. In SAS I would have just used proc tabulate with Day as the class and Views, Engagements, and Sales as the variables but I'm unsure of how to translate this into R code.

Monday  21JUL03 7206    32  $6.73
Tuesday 22JUL03 9333    51  $4.99
Wednesday   23JUL03 8321    61  $8.87
Thursday    24JUL03 8378    35  $3.69
Friday  25JUL03 12202   45  $4.34
Saturday    26JUL03 6161    34  $3.12
Sunday  27JUL03 9115    29  $2.77
Monday  28JUL03 17112   51  $10.36
Tuesday 29JUL03 12690   51  $10.24
Wednesday   30JUL03 10822   30  $3.96
Thursday    31JUL03 10395   41  $5.45
Friday  01AUG03 6979    31  $2.95
Saturday    02AUG03 3810    19  $1.78
Sunday  03AUG03 4554    30  $5.71
regents
  • 600
  • 6
  • 15
  • 2
    take a look at `?aggregate` ... – Ben Bolker Apr 07 '18 at 23:55
  • first parse the date, then group, and finally summarise: library(dplyr) library(anytime) data %>% mutate(day=anytime(day)) %>% group_by(day) %>% summarize(total_sales=mean(sales)) – Elio Diaz Apr 08 '18 at 00:03
  • `DF <- transform(DF, Date = as.Date(Date, "%d%b%y"), Sales = as.numeric(sub("$", "", Sales, fixed = TRUE))); aggregate(cbind(Views, Engagement, Sales) ~ Day, DF, mean)` – G. Grothendieck Apr 08 '18 at 00:14

2 Answers2

1

OP wants to calculate mean for 3 columns of his data.frame. Hence, dplyr::summarise_at should be a good option to go for.

The solution is two steps process as:

  1. Read from tab separated file
  2. Process data using dplyr

Solution:

# Read from file. "sales.txt" has been created using OP's data.
df <- read.delim("sales.txt", header = FALSE, stringsAsFactors = FALSE)
names(df) <- c("Day", "Date", "Views", "Engagement", "Sales")

library(dplyr)

df %>% mutate(Sales = as.numeric(sub("\\$","", Sales))) %>%
  group_by(Day) %>%
  summarise_at(vars(c("Views", "Engagement", "Sales")),funs(Mean = mean))


# Result
# # A tibble: 7 x 4
#   Day       Views_Mean Engagement_Mean Sales_Mean
#   <chr>          <dbl>           <dbl>      <dbl>
# 1 Friday          9590            38.0       3.64
# 2 Monday         12159            41.5       8.54
# 3 Saturday        4986            26.5       2.45
# 4 Sunday          6834            29.5       4.24
# 5 Thursday        9386            38.0       4.57
# 6 Tuesday        11012            51.0       7.62
# 7 Wednesday       9572            45.5       6.41
MKR
  • 19,739
  • 4
  • 23
  • 33
0

Maybe something like this?

library(tidyverse)

Date <- seq(lubridate::ymd('2012-07-03'),lubridate::ymd('2012-07-20'),by='days')
Day <- lubridate::wday(Date, label = TRUE)
Views <- sample(c(4000:20000), length(Date))
Engagement <- sample(c(20:50), length(Date))
Sales <- sample.int(300:1000, length(Date))/100

df <- data.frame(Day, Date, Views, Engagement, Sales) %>%
    group_by(Day) %>%
    summarise(mean_engagement = mean(Engagement), 
              mean_views = mean(Views), 
              mean_sales = mean(Sales))

df
J. Trimarco
  • 149
  • 1
  • 8