sum by year in a row in a dataframe in r

Question

I have a dataframe with two columns (year and precipitation). In a single column, the year is listed such that it starts from 1900 and ends at 2014 and again starts with 1900. In another column I have precipitation value of the respective year. Now i want to add all the precipitation of 1900 as 1 value and 1901 as 1 to up to 2014. My data looks like:

Year    Precipitation

1900    4.826
1901    37.592
2014    14.224
1900    45.974
1901    46.228
2014    79.502
1900    52.578
1901    22.30
2014    15.25

The results should look like:

Year   Precipitation

1900   103.378
1901   106.12
2014   108.976

So far I wrote a code but it does not work, if anybody can fix it?

data=read.table('precipitation.csv',header=T,sep=',')
frame=data.frame(data)
cumcum=tapply(frame$Precipitation, cumsum(frame$year==1), FUN=sum, na.rm=TRUE)

Thanks

Why do you have duplicate values for Year, are they Quarters or Months or something? — smci, Mar 31 '15 at 07:56

score 4 · Answer 1 · answered Mar 31 '15 at 05:23

4

Try data.table

library(data.table)
frame=fread('precipitation.csv',header=TRUE,sep=',')    
frame[, sum(Precipitation), by = Year]

answered Mar 31 '15 at 05:23

vrajs5

4,066
1
27
44

score 4 · Answer 2 · edited May 23 '17 at 12:34

4

1 liner -- try:

aggregate(frame['Precipitation'], by=frame['Year'], sum)

Reference: Consolidate duplicate rows

edited May 23 '17 at 12:34

Community

1
1

answered Mar 31 '15 at 05:25

csiu

3,159
2
24
26

score 2 · Answer 3 · answered Mar 31 '15 at 05:20

That seems overly complicated. Why not just do the sums separately?

s.1900 <- sum(frame$Precipitation[frame$year == 1900])
s.1901 <- sum(frame$Precipitation[frame$year >= 1901 & frame$year <= 2013])
s.2014 <- sum(frame$Precipitation[frame$year == 2014])

It actually leaves your code readable for later.

sum by year in a row in a dataframe in r

3 Answers3