5

I have a dataframe with two columns (year and precipitation). In a single column, the year is listed such that it starts from 1900 and ends at 2014 and again starts with 1900. In another column I have precipitation value of the respective year. Now i want to add all the precipitation of 1900 as 1 value and 1901 as 1 to up to 2014. My data looks like:

Year    Precipitation

1900    4.826
1901    37.592
2014    14.224
1900    45.974
1901    46.228
2014    79.502
1900    52.578
1901    22.30
2014    15.25

The results should look like:

Year   Precipitation

1900   103.378
1901   106.12
2014   108.976

So far I wrote a code but it does not work, if anybody can fix it?

data=read.table('precipitation.csv',header=T,sep=',')
frame=data.frame(data)
cumcum=tapply(frame$Precipitation, cumsum(frame$year==1), FUN=sum, na.rm=TRUE)

Thanks

csiu
  • 3,159
  • 2
  • 24
  • 26
Juvin
  • 115
  • 1
  • 2
  • Why do you have duplicate values for Year, are they Quarters or Months or something? – smci Mar 31 '15 at 07:56

3 Answers3

4

Try data.table

library(data.table)
frame=fread('precipitation.csv',header=TRUE,sep=',')    
frame[, sum(Precipitation), by = Year]
vrajs5
  • 4,066
  • 1
  • 27
  • 44
4

1 liner -- try:

aggregate(frame['Precipitation'], by=frame['Year'], sum)

Reference: Consolidate duplicate rows

Community
  • 1
  • 1
csiu
  • 3,159
  • 2
  • 24
  • 26
2

That seems overly complicated. Why not just do the sums separately?

s.1900 <- sum(frame$Precipitation[frame$year == 1900])
s.1901 <- sum(frame$Precipitation[frame$year >= 1901 & frame$year <= 2013])
s.2014 <- sum(frame$Precipitation[frame$year == 2014])

It actually leaves your code readable for later.

LauriK
  • 1,899
  • 15
  • 20