-2

suppose we have a data set like this:

Year State SomeValue

2000 NY   1000
2000 NY   1200
2000 NY   1100
2001 NY   2000
2001 NY   2200
...

How can I aggregate all 3 columns to have a data set look like:

year state somevalue
2000 NY 3300
2001 NY 4200


aggregate(data$year, list(data$state, data$somevalue), data, sum)

is this the correct way to do it ?

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
Mefhisto1
  • 2,188
  • 7
  • 34
  • 73
  • Hi, Take a bit of time and read the tag excerpt before tagging. [tag:dataframes] is for pandas, whereas you need [tag:data.frame] here. Be careful the next time. See this meta post. [Warn \[r\] users from adding \[dataframes\] tag instead of \[data.frame\] tag](http://meta.stackoverflow.com/q/318933) – Bhargav Rao Mar 14 '16 at 13:42

5 Answers5

1

Hmm there are numerous ways to aggregate data in R. Using aggregate(), you could e.g. do

aggregate(SomeValue ~ Year+State, data=data, FUN=sum)

or

with(data, aggregate(x = SomeValue, by = list(Year=Year, State=State), FUN = sum))
lukeA
  • 53,097
  • 5
  • 97
  • 100
1

You should probably spend some time with the basics of R syntax before doing such things -- your attempt hints at a fundamental misunderstanding of some things in R, but the following might do the trick for you:

x <- aggregate(somevalue ~ year + state, data=data, FUN=sum)
Livius
  • 3,240
  • 1
  • 17
  • 28
0
library(sqldf)
sqldf( "select year , state , sum( somevalue ) as sum_somevalue from data group by year , state" )
Anthony Damico
  • 5,779
  • 7
  • 46
  • 77
0

if dat is the dataset

library(dplyr)
dat %>%
group_by(year, state) %>%
summarise(somevalue=sum(somevalue))
akrun
  • 874,273
  • 37
  • 540
  • 662
0

Your suggested solution was very close. Here is a slightly modified version that does not return a warning:

data <- read.table(text='
year state somevalue
2000 NY   1000
2000 NY   1200
2000 NY   1100
2001 NY   2000
2001 NY   2200
', header=TRUE)

aggregate(data$somevalue, list(data$state, data$year), sum)

Although, @lukeA's solutions and @Livius's solution are better in base R because they return the desired column names in the requested order.

This returns two of the three requested column names:

aggregate(data$somevalue, list(state=data$state, year=data$year), sum)

My second solution differs from LukeA's second solution only in that I did not use the with function and did not label the aggregate options. His answer is better.

Mark Miller
  • 12,483
  • 23
  • 78
  • 132