-1

The data.frame my_data consists of two columns("PM2.5" & "years") & around 6400000 rows. The data.frame has various data points for pollutant levels of "PM2.5" for years 1999, 2002, 2005 & 2008. This is what i have done to the data.drame:

{ 
my_data <- arrange(my_data,year)

my_data$year <- as.factor(my_data$year)
my_data$PM2.5 <- as.numeric(my_data$PM2.5)
}

I want to find the sum of all PM2.5 levels (i.e sum of all data points under PM2.5) according to different year. How can I do it.

!The image shows the first 20 rows of the data.frame. Since the column "years" is arranged, it is showing only 1999

VIVEK
  • 3
  • 1
  • 3
  • 1
    What have you tried? A reproducible example would be nice (nobody wants to type in your data from an image) http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Heroka Jul 21 '15 at 18:54

1 Answers1

2

Say this is your data:

library(plyr) # <- don't forget to tell us what libraries you are using

give us an easy sample set

my_data <- data.frame(year=sample(c("1999","2002","2005","2008"), 10, replace=T), PM2.5 = rnorm(10,mean = 5)) 
my_data <- arrange(my_data,year)

my_data$year <- as.factor(my_data$year)
my_data$PM2.5 <- as.numeric(my_data$PM2.5)

> my_data
   year    PM2.5
1  1999 5.556852
2  2002 5.508820
3  2002 4.836500
4  2002 3.766266
5  2005 6.688936
6  2005 5.025600
7  2005 4.041670
8  2005 4.614784
9  2005 4.352046
10 2008 6.378134

One way to do it (out of many, many ways already shown by a simple google search):

> with(my_data, (aggregate(PM2.5, by=list(year), FUN="sum")))
  Group.1         x
1    1999  5.556852
2    2002 14.111586
3    2005 24.723037
4    2008  6.378134
N8TRO
  • 3,348
  • 3
  • 22
  • 40