-1

I have 2 tables (a and b) with 365 records each (1 year data). I want to get mean of each month in table a and if it falls below 0.01 then remove all daily values belonging to that month and output a new table. Also, I want corresponding daily values to be removed from table b as well to produce a new table for it.

For example: If January and April have monthly mean less than 0.01 then output table a and b with 304 values each. Outputs from dput(head(a)) and dput(head(b)) are respectively:

structure(list(V1 = c(0, 0, 0, 0.43, 0.24, 0)), .Names = "V1", row.names = c(NA, 6L), class = "data.frame")

structure(list(V1 = c(0.042022234, 0.014848409, 0.275174289, 0.485364883, 0.177960815, 0.006799459)), .Names = "V1", row.names = c(NA, 6L), class = "data.frame")

I don't know how to use list comprehension in R. Any suggestions would be appreciative.

Ibe
  • 5,615
  • 7
  • 32
  • 45
  • You really need to try something and then show us what you've tried. 2 hints: `?"["` & `?as.POSIXlt` – Jesse Anderson Nov 26 '14 at 01:32
  • Is a table a `data.frame`, or a `table`? Or is it a `list` like the tag suggests? Please make a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – GSee Nov 26 '14 at 01:40
  • I have a text file and read it with `read.table` – Ibe Nov 26 '14 at 01:41
  • edit your question to include the output of `dput(head(a))` and `dput(head(b))` – GSee Nov 26 '14 at 01:42

3 Answers3

1

If table means data.frame and data structure is similar to @eclark's sample data, you could try something like this using dplyr.

DATA

set.seed(123)
a <- data.frame(Date=seq.Date(from = as.Date("2013-01-01"),to = as.Date("2013-12-31"),
                length.out = 365), value=rnorm(n = 365,mean = .01,sd = .1))

b <- data.frame(Date=seq.Date(from = as.Date("2013-01-01"),to = as.Date("2013-12-31"),
                length.out = 365), value=rnorm(n = 365,mean = .01,sd = .15))

CODE

library(dplyr)

# Create a column with month
mutate(a, month = as.character(format(Date, "%m"))) -> a
mutate(b, month = as.character(format(Date, "%m"))) -> b

# Get mean for each month and get months with average lower than 0.01 in the data frame, a
summarise(group_by(a, month), average = mean(value)) %>%
filter(average < 0.01) -> wutever

#wutever
#Source: local data frame [5 x 2]
#
#  month       average
#1    01  0.0068172630
#2    04  0.0006111069
#3    05 -0.0052247522
#4    08  0.0008155293
#5    12  0.0054872409

# Remove data points including months in wutever from a and b
filter(a, !month %in% wutever$month) -> newA
filter(b, !month %in% wutever$month) -> newB   
jazzurro
  • 23,179
  • 35
  • 66
  • 76
  • I have single column with values and don't have date column as part of data frame – Ibe Nov 26 '14 at 02:50
  • @Ibe Where do you have date then? – jazzurro Nov 26 '14 at 02:50
  • @jazzuro: Data was downloaded by specifying start and end date from weather service. But table only contain values for that period – Ibe Nov 26 '14 at 02:52
  • @Ibe In that case, you need to create a sequence of date and create a data frame / data table including the dates and values you have. That will be presumably similar to what eclark created in his/her answer. – jazzurro Nov 26 '14 at 02:56
0

Not really the most elegant or fast way but here is an idea:

a <- data.frame(Date=seq.Date(from = as.Date("2013-01-01"),to = as.Date("2013-12-31"),length.out = 365),a=rnorm(n = 365,mean = .01,sd = .1)) 
b <- data.frame(Date=seq.Date(from = as.Date("2013-01-01"),to = as.Date("2013-12-31"),length.out = 365),b=rnorm(n = 365,mean = .01,sd = .15)) 
require(dplyr)
c <- merge(a,b,by=1)
c <- tbl_df(data = c)
c <- mutate(c, month=substr(c$Date,6,7))
d <- summarise(group_by(c, month),am = mean(a),bm=mean(b))
c <- left_join(c,d)
c <- filter(c, c$am>=.01 & c$bm>= .01)
a <- c[,c(2,3)]
b <- c[,c(2,4)]
remove(c,d)
eclark
  • 819
  • 7
  • 16
0

Using base function alone, and assuming your two data frames have variables day, month, and value:

> new_a <- do.call(rbind, by(a, a$month, function(df) {
      ifelse(mean(df$value) < 0.01, NULL, df)
  }))
> new_b <- subset(b, day %in% new_a$day)

Or, you can use the plyr package and try:

> new_a <- ddply(a, .(month), function(df) ifelse(mean(df$value) < 0.01, NULL, df))
> new_b <- subset(b, day %in% new_a$day)
mmuurr
  • 1,310
  • 1
  • 11
  • 21
  • I only have one variable in data frame that holds values. Is it possible to use only values instead of having `day` and `month` variables? – Ibe Nov 26 '14 at 02:05
  • I'm confused... how do you know which value corresponds to which day? You might need to clarify your question a bit. – mmuurr Nov 26 '14 at 03:42
  • I somehow was able to export date information into downloaded file but it is in following `19800101`. I need date format to convert it into only days, right? – Ibe Nov 26 '14 at 04:21