2

I am trying to sum the revenue values of quarters of the same year. In Excel there is the sumif function that allows you to do so. I am using the R equivelnt, aggregate but I am struggling with getting it to work. The sum figures in the output are completely incorrect. What am I doing wrong?

> aggregate( . ~ dateyear,data=x,sum)
   dateyear revenue
1      2001     130
2      2002     176
3      2003     155
4      2004     159
5      2005     150
6      2006     161
7      2007     144
8      2008     120
9      2009      69
10     2010      54
11     2011      66
12     2012      92
13     2013     116
14     2014      94
15     2015      99

dput(x)
structure(list(dateyear = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 
6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 10L, 10L, 
10L, 10L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 
13L, 13L, 14L, 14L, 14L, 14L, 15L, 15L, 15L, 15L), .Label = c("2001", 
"2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", 
"2010", "2011", "2012", "2013", "2014", "2015"), class = "factor"), 
    revenue = structure(c(47L, 43L, 40L, 58L, 45L, 38L, 35L, 
    57L, 37L, 27L, 34L, 55L, 44L, 29L, 31L, 51L, 39L, 28L, 32L, 
    56L, 42L, 30L, 33L, 59L, 36L, 25L, 24L, 54L, 26L, 23L, 17L, 
    50L, 12L, 5L, 2L, 41L, 8L, 4L, 1L, 46L, 10L, 7L, 3L, 48L, 
    19L, 16L, 9L, 52L, 21L, 15L, 15L, 13L, 49L, 20L, 14L, 11L, 
    53L, 22L, 18L, 6L), .Label = c("1373", "1390.7", "1416.5", 
    "1420.8", "1455.2", "1472.9", "1475.1", "1482.7", "1486.3", 
    "1498.8", "1499.1", "1505.3", "1506.9", "1512.9", "1516.8", 
    "1525.2", "1546.1", "1550.8", "1583.2", "1588.5", "1589.4", 
    "1613.4", "1646.5", "1674.2", "1689.1", "1713.6", "1721.5", 
    "1728.5", "1730.1", "1748.6", "1755.1", "1761.2", "1762.6", 
    "1764.5", "1794.3", "1799.5", "1813.9", "1818", "1838.7", 
    "1872.3", "1875.4", "1879", "1885.6", "1911.9", "1972.8", 
    "1977", "1977.6", "2009.4", "2078.7", "2082.3", "2131.5", 
    "2154.1", "2179.6", "2208.1", "2299.1", "2379.6", "2387.9", 
    "2534", "2563.3"), class = "factor")), .Names = c("dateyear", 
"revenue"), row.names = c(NA, -60L), class = "data.frame")
jessica
  • 1,325
  • 2
  • 21
  • 35
  • 3
    The values in your `revenue` column are `factor`s, not numeric. Change with `as.numeric(as.character(revenue))`. – A5C1D2H2I1M1N2O1R2T1 Jan 17 '16 at 06:58
  • Thank you!! Can you explain to me what " . ~ " means?? in the aggregate command. I just tried to emulate some others people commands and fit my own variables in them. But I don't understand what " . ~ " represents. – jessica Jan 17 '16 at 07:08
  • That's just like saying "aggregate all of the columns not specified by the RHS of the `~`, grouped by the RHS". – A5C1D2H2I1M1N2O1R2T1 Jan 17 '16 at 07:09
  • @AlexandreHalm what is "easier to reproduce" has to do with anything here? If your solution works fine on the data set you've created but doesn't work on the real data set Jessica has, how is that helping her? Do you expect her use your mock data set instead of her real one? – David Arenburg Jan 17 '16 at 07:11

1 Answers1

2

As Ananda said, your revenue values are not actual numbers, strings codified as integers (i.e. a factor). When you sum them, you got those odd numbers. After you apply Ananda's code

x$revenue <- as.numeric(as.character(x$revenue))

your aggregate function should work. This is what I got:

aggregate( revenue ~ dateyear,data=x,sum)
   dateyear revenue
1      2001  5735.5
2      2002  8119.1
3      2003  7687.8
4      2004  7696.2
5      2005  7459.9
6      2006  7769.8
7      2007  7726.1
8      2008  7114.3
9      2009  6433.5
10     2010  6151.9
11     2011  6367.4
12     2012  6604.1
13     2013  8284.0
14     2014  6679.2
15     2015  6816.7
Carlos Alberto
  • 598
  • 3
  • 9
  • 1
    When answering in behalf of someone, it is better to make the answer a [Community wiki](http://meta.stackoverflow.com/a/251598/3001626) – David Arenburg Jan 17 '16 at 07:09