-3

I have a dataframe with 213 rows and 2 columns (Date and Probability). Thanks to a previous post I am able to reduce the number of rows by grouping the Date by quarters. The problem is that, now, the second column, Probability, is composed by numbers that I want to average accordingly.

Let's take an example.

Date <- c("2000-01-05", "2000-02-03", "2000-03-02", "2000-03-30", "2000-04-13", "2000-05-11", "2000-06-08", "2000-07-06", "2000-09-14", "2000-10-05", "2000-10-19", "2000-11-02", "2000-12-14")
Article <- c(0.5, 1, 0.3, 0.8, 0.7, 2, 3, 1.5, 1, 2, 0.6, 0.5, 0.5)

Date <- data.frame(Date)
Article <- data.frame(Article)

df <- cbind(Date, Article)

#Dataframe

Date           Probability
1  2000-01-05   0.5
2  2000-02-03   1
3  2000-03-02   0.3
4  2000-03-30   0.8
5  2000-04-13   0.7
6  2000-05-11   2
7  2000-06-08   3
8  2000-07-06   1.5
9  2000-09-14   1
10 2000-10-05   2
11 2000-10-19   0.6
12 2000-11-02   0.5
13 2000-12-14   0.5

The final output I would like to obtain is the following:

Date         Probability
1  2000 Q1   0.65
2  2000 Q2   1.9
3  2000 Q3   1.25
4  2000 Q4   0.9

Essentially, the rows have been grouped together by quarters and the numbers associated have been averaged accordingly.

I have no idea how to do it, unfortunately.

Can anyone help me out?

Thanks!

Rollo99
  • 1,601
  • 7
  • 15
  • 2
    You can take my previous answer and just change it to calculate the mean: `df %>% group_by(Date = as.character(lubridate::quarter(ymd(Date), with_year = TRUE))) %>% summarise(Article = mean(Article))`. – tmfmnk Oct 19 '19 at 21:22
  • 2
    Not really :) Take a look on how `group_by()`, `mutate()` and `summarise()` from `dplyr` work and you will understand it quickly. – tmfmnk Oct 19 '19 at 21:28
  • 1
    Possible duplicate of [How to sum a variable by group](https://stackoverflow.com/questions/1660124/how-to-sum-a-variable-by-group) – DanY Oct 19 '19 at 22:02
  • 2
    `library(zoo); aggregate(read.zoo(df), as.yearqtr, mean)` – G. Grothendieck Oct 20 '19 at 02:26

2 Answers2

1

Base R solution:

  # Summarise the dataframe: 

summary_df <- aggregate(list(Probability = df$Article),  

                    by = list(Date = paste(gsub("[-].*", "", df$Date), quarters(df$Date), sep = " ")),

                    FUN = mean)

Data:

df <- data.frame(Date = as.Date(c("2000-01-05", "2000-02-03", "2000-03-02", "2000-03-30", "2000-04-13", "2000-05-11", "2000-06-08", "2000-07-06", "2000-09-14", "2000-10-05", "2000-10-19", "2000-11-02", "2000-12-14"), "%Y-%m-%d"),
                   Article = c(0.5, 1, 0.3, 0.8, 0.7, 2, 3, 1.5, 1, 2, 0.6, 0.5, 0.5))
hello_friend
  • 5,682
  • 1
  • 11
  • 15
1

Combining data.table with lubridate:

Code

require(data.table); require(lubridate)

setDT(df)
df = df[, .(MeanProb = mean(Article)), .(Qtr = as.yearqtr(as.Date(Date)))]

Output

> df
       Qtr MeanProb
1: 2000 Q1     0.65
2: 2000 Q2     1.90
3: 2000 Q3     1.25
4: 2000 Q4     0.90
JDG
  • 1,342
  • 8
  • 18