1

I am new to R. I have daily data and want to separate months with mean less than 1 from rest of data. Do something on daily data (with mean greater than 1). The important thing is not to touch daily values with monthly mean less than 1.

I have used aggregate(file,as.yearmon,mean) to get monthly mean but failing to grasp on how to use it to filter specific month's daily values from analysis. Any suggestion to start would be highly appreciative.

I have reproduced data using a small subset of it and dput:

structure(list(V1 = c(0, 0, 0, 0.43, 0.24, 0, 1.06, 0, 0, 0, 1.57, 1.26, 1.34, 0, 0, 0, 2.09, 0, 0, 0.24)), .Names = "V1", row.names = c(NA, 20L), class = "data.frame")

A snippet of code I am using:

library(zoo)
file <- read.table("text.txt")
x_daily <- zooreg(file, start=as.Date("2000-01-01"))
x1_daily <- x_daily[]
con_daily <- subset(x1_daily, aggregate(x1_daily,as.yearmon,mean) > 1 ) 
Ibe
  • 5,615
  • 7
  • 32
  • 45
  • take a look at `?subset`. You can also use `dput` to give us a better example of your data.. – AndrewMacDonald Jul 02 '14 at 19:18
  • How do you suggest i use `dput`? – Ibe Jul 02 '14 at 19:32
  • I read my text file `ifile <- read.table(file.txt)` and then `dput(ifile)` gave me a long list of values in file with these at the end ` .Names = "V1", class = "data.frame", row.names = c(NA, -10950L))` – Ibe Jul 02 '14 at 19:41
  • 1
    See this answer to "how to make a great reproducible example" http://stackoverflow.com/a/5963610/1727133 – AndrewMacDonald Jul 02 '14 at 19:45
  • I got following: `structure(list(V1 = c(0, 0, 0, 0.43, 0.24, 0, 1.06, 0, 0, 0, 1.57, 1.26, 1.34, 0, 0, 0, 2.09, 0, 0, 0.24)), .Names = "V1", row.names = c(NA, 20L), class = "data.frame") ` – Ibe Jul 02 '14 at 19:49
  • Yes, but that can't be the right data; it has no factor called `as.yearmon` ! – AndrewMacDonald Jul 02 '14 at 19:51
  • Ibe, your code and example should be completely self contained so that we don't need to guess what things are. [The link](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) that @AndrewMacDonald posted, explains why this is important - not least is getting you the answer that you need. – Andy Clifton Jul 02 '14 at 20:02

1 Answers1

1

Let's create some sample data:

feb2012 <- data.frame(year=2012, month=2, day=1:28, data=rnorm(28))
feb2013 <- data.frame(year=2013, month=2, day=1:28, data=rnorm(28) + 10)
jul2012 <- data.frame(year=2012, month=7, day=1:31, data=rnorm(31) + 10)
jul2013 <- data.frame(year=2013, month=7, day=1:31, data=rnorm(31) + 10)
d <- rbind(feb2012, feb2013, jul2012, jul2013)

You can get an aggregate of the data column by month like this:

> a <- aggregate(d$data, list(year=d$year, month=d$month), mean)
> a
  year month           x
1 2012     2  0.09704817
2 2013     2  9.93354271
3 2012     7 10.19073868
4 2013     7  9.78324133

Perhaps not the best way, but an easy way to filter the d data frame by the mean of the corresponding year and month is to work with a temporary data frame that merges d and a, like this:

work <- merge(d, a)
subset(work, x > 1)

I hope this will help you get started!

janos
  • 120,954
  • 29
  • 226
  • 236
  • This is exactly what I want to do. I have 28 years of data which means 28 Jan months. Is it possible to bind all months recursively just like you only did for feb and jul? – Ibe Jul 02 '14 at 20:13
  • The only issue I see here is that data is for 28 years which means 336 months. If I have to type all months like `feb2012` `feb2013` etc then it will consume a lot of time. I need to do it automatically in code. – Ibe Jul 02 '14 at 21:26
  • 1
    I generated the sample data frame above in a way that's easy to understand and reproduce. I don't know how your actual data is organized, and you did not explain that in your question. If you have difficulty getting your data into a similar data frame, it would be better to post a new question, focusing on that part. – janos Jul 02 '14 at 21:46