2

I have a daily revenue dataset df from 2016-01-01 to 2017-05-21. The dataset contains Datum, languages and Opbrengst variables.

       Datum    lanuage  Opbrengst
596    20160101  bg       254
923    20160101  bg-bg    434
1044   20160101  ca       115
1544   20160101  ca-es    238
2008   20160101  cs       251
....

I want to group by Datum for the Opbrengst.

I've tried the method from How to sum a variable by group?

 aggregate(Datum ~ Opbrengst, data=df, FUN="sum")

or

 tapply(df$Datum, df$Opbrengst, FUN=sum)

The results become

       Opbrengst     Datum
1             10   786304986
2            100  1048457710
3           1000   221796843
4        1000,01    20160628
5        1000,78    20170104

This is not the result I want. I want to have the sum of the revenue of each date. I am wondering where is the problem?

Sheryl
  • 721
  • 1
  • 9
  • 17
  • 1
    Did you meant `aggregate(Opbrengst~Datum, df1, sum)` – akrun May 30 '17 at 12:26
  • 3
    It doesn't matter how quick you type, @akrun is always there first... :-) – Phil May 30 '17 at 12:30
  • @akrun I've tried `aggregate(Opbrengst~Datum, df, sum)` but it shows `Error in Summary.factor(c(5646L, 9263L, 647L, 5198L, 5556L, 384L, 7080L, : ‘sum’ not meaningful for factors` – Sheryl May 30 '17 at 12:30
  • 1
    @Sheryl It means your Opbrengst column is not numeric as we assumed. Do you have `,` etc in `Opbrengst`? In that `df$Opbrengst <- as.numeric(gsub(",", "", df$Opbrengst))` and then apply the code – akrun May 30 '17 at 12:32
  • 1
    @akrun, Thanks a lot! The problem solved! :) – Sheryl May 30 '17 at 12:36

1 Answers1

2

We have two problems.

1) the use of grouping variable in the formula method of aggregate. The grouping variable is placed at the rhs of ~ while the variable of interest Opbrengst on the lhs

aggregate(Opbrengst~Datum, df1, sum)

2) The column 'Opbrengst' is factor. It seems to have , character and that result in factor class while reading (if we don't specify stringsAsFactors = FALSE in read.csv/read.table etc.). One option is to remove the , with sub, convert to numeric and then use aggregate

df$Opbrengst <- as.numeric(gsub(",", "", df$Opbrengst))
João Daniel
  • 8,696
  • 11
  • 41
  • 65
akrun
  • 874,273
  • 37
  • 540
  • 662