0

I am now learning R and using the SEAS package to help me with some calculation in R and data is the same format as SEAS package likes. It is a time series

require(seas)
data(mscdata)
dat.int <- (mksub(mscdata, id=1108447))

the heading of the data and it is 20 years of data

  year yday  date t_max t_min t_mean rain snow precip

However, I now need to calculate the number of days in each month rainfall is >= 1.0mm . So at the end of it. I would have two columns ( each month in each year and total # of days in each month rainfall>= 1.0mm )

I'm not certain how to write this code and any help would be appreciated

Thank you

Lam

smci
  • 32,567
  • 20
  • 113
  • 146
Lam
  • 23
  • 6
  • 1
    Post some reproducible R code using `dput(yourdataframe)`. *"data is the same format as SEAS package likes"* is not acceptable. – smci Oct 20 '14 at 20:01
  • 1
    Hi there,, i am not sure what you are asking me..but i edited ..so hope it makes better sense.. this is my first attempt programming..so please forgive me if not correct in expressing things the right way – Lam Oct 20 '14 at 20:17

1 Answers1

4

I now need to calculate the number of days in each month rainfall is >= 1.0mm. So at the end of it. I would have two columns ( each month in each year and total # of days in each month rainfall>= 1.0mm )

1) So dat.int$date is a Date object. First step is you need to create a new column dat.int$yearmon extracting the year-month, e.g. using zoo::yearmon Extract month and year from a zoo::yearmon object

require(zoo)
dat.int$yearmon <- as.yearmon(dat.int$date, "%b %y")

2) Second, you need to do a summarize operation (recommend you use plyr or the newer dplyr) on rain>=1.0 aggregated by yearmon. Let's name our resulting column rainy_days.

If you want to store rainy_days column back into the dat.int dataframe, you use a transform instead of a summarize:

ddply(dat.int, .(yearmon), transform, rainy_days=sum(rain >= 1.0) )

or else if you really just want a new summary dataframe:

require(plyr)
rainydays_by_yearmon <- ddply(dat.int, .(yearmon), summarize, rainy_days=sum(rain >= 1.0) )
print.data.frame(rainydays_by_yearmon)

     yearmon rainy_days
1   Jan 1975         14
2   Feb 1975         12
3   Mar 1975         13
4   Apr 1975          6
5   May 1975          6
6   Jun 1975          5
...
355 Jul 2004          3
356 Aug 2004          7
357 Oct 2004         14
358 Nov 2004         16
359 Dec 2004         19

Note: you can do the above with plain old R, without using zoo or plyr/dplyr packages. But might as well teach you nicer, more scalable, maintainable code idioms.

Community
  • 1
  • 1
smci
  • 32,567
  • 20
  • 113
  • 146
  • Hi I have another question of this dataset.. I am trying to subset the data by months ( eg i want all Jan and its raindays together) I tried this months <- subset(rainydays_by_yearmon, month == 1 ) but is not working .. what am i doing wrong? – Lam Oct 22 '14 at 13:59
  • As in "all Januaries, across all years"? Then you'll want a 'month' column; read `Date, zoo` packages and SO to get the month from your Date object. – smci Oct 22 '14 at 15:42
  • You need to create another new column called 'month'. – smci Oct 22 '14 at 18:22
  • I tried the column name month, used the code above, but I still got blank again .. 1] month rainy_days <0 rows> (or 0-length row.names). – Lam Oct 22 '14 at 18:57
  • There *is* no column 'month', currently. You need to **create** one. Like I said three times now. `dat.int <- youfigureoutwhatfunctiontouse((dat.int$date, maybeanotherarg)` – smci Oct 22 '14 at 22:02
  • thanks perhaps my English was bad, but I did had a column name month and tried the my code above and it didn't work. I am sorry for the inconvenience I caused you. thanks for your time – Lam Oct 23 '14 at 12:31
  • If we need to see your new code, post a totally new question. (Link it here) – smci Oct 23 '14 at 19:52