0

I am running this command:

aggregated_quarterly_realised <- aggregate(merged_dataset$dependent_variable, list(merged_dataset$qy), mean)

which gives me the total amount per quarter. But I would like to get separately the sums in case the merged_dataset$dependent_variable is equal to 0, 1, and the total. Thus I would like to get three values per quarter. How can I do that?

EDIT:

> dput(head(merged_dataset$dependent_variable,10))
c(0, 0, 0, 0, 1, 0, 0, 0, 1, 0)
> dput(head(merged_dataset$qy,10))
structure(c(2008.25, 2008.25, 2008.50, 2008.75, 2009.25, 2009.50, 
2008.25, 2008.25, 2008.25, 2008.25), class = "yearqtr")
> dput(head(merged_dataset$test,10))
c(7101273.07, 6855586.59, 800585.78, 8029604.44, 6707122.59, 
646079.46, 14598.96, 1303978, 15244705, 322058.74)

What I want is the aggregated values per quarter (quarters are in the merged_dataset$qy variable) for the test variable (merged_dataset$test) separately for the values 0 of the dependent variable, the value 1, and the total.

adrCoder
  • 3,145
  • 4
  • 31
  • 56
  • 1
    Hi, please provide us with a [minimal reproductible example](https://stackoverflow.com/a/5963610/2414988) (ie. some data to run your code) to make it easier for us to help you. – Biblot Oct 18 '19 at 15:12
  • 1
    Hi @SamuelDiebolt I edited my post and added some dput with data. Please let me know if that is ok. – adrCoder Oct 21 '19 at 07:45

1 Answers1

1

Using data.table:

Code

dtf = dt[, .(Dep1sum = sum(test[depvar == 1]), 
             Dep0sum = sum(test[depvar == 0]),
             Sum = sum(test)), .(qy)]

Result

> dtf
        qy  Dep1sum    Dep0sum        Sum
1: 2008.25 15244705 15597495.4 30842200.4
2: 2008.50        0   800585.8   800585.8
3: 2008.75        0  8029604.4  8029604.4
4: 2009.25  6707123        0.0  6707122.6
5: 2009.50        0   646079.5   646079.5

Data

dt = data.table(
  depvar = c(0, 0, 0, 0, 1, 0, 0, 0, 1, 0),
  qy = c(2008.25, 2008.25, 2008.50, 2008.75, 2009.25, 2009.50, 2008.25, 2008.25, 2008.25, 2008.25),
  test = c(7101273.07, 6855586.59, 800585.78, 8029604.44, 6707122.59, 646079.46, 14598.96, 1303978, 15244705, 322058.74)
)
JDG
  • 1,342
  • 8
  • 18
  • I get the following error `Error in .(Dep1sum = sum(test[depvar == 1]), Dep0sum = sum(test[depvar == : could not find function "."` – adrCoder Oct 21 '19 at 11:11
  • Have you loaded the `data.table` library? Execute `require(data.table)`. Afterwards, ensure that your data is in data.table format by executing `setDT(merged_dataset)`. – JDG Oct 21 '19 at 11:25
  • No problem! `data.table` is highly useful for data wrangling like this. It also outperforms other packages in terms of speed as data scales in size (https://h2oai.github.io/db-benchmark/). See https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html for an introductory tour. – JDG Oct 21 '19 at 11:38
  • Sorry, it is actually not right. The sum is smaller than Dep1sum which should not be the case.. why is this happening? Can you fix it please? In your data it seems to be right tho.. weird – adrCoder Oct 21 '19 at 11:40
  • I took a look at the links. Indeed data.table seems to have an amazing performance. I was not aware of that and I was using data frames all the time. Interesting! Thanks! – adrCoder Oct 21 '19 at 11:43
  • Are you sure that you are observing Dep1sum > Sum? You can verify this by executing `dtf[, Sum >= Dep1sum]`. In your example case (and my output case), `Sum` is always greater than or equal to `Dep1sum`. Output below. `[1] TRUE TRUE TRUE TRUE TRUE` – JDG Oct 21 '19 at 11:45
  • 1
    You are right. My bad!!! Sorry! I was getting `2008 Q2 291159243,7 1127901086 1419060330 ` and got confused – adrCoder Oct 21 '19 at 11:47