-1

I want to have the value and dates corresponding to number of occurences. I use this fonction that does what i want very well.

 count <- function(df, min_build, min_days) {
 sum(with(rle(df$build > min_build), values & lengths >= min_days))}

My data looks like:

data = data.frame(station, build, dates, Year, Month, day)

   station build     dates  Year Month day 
1   Bariko 24.5 1960-01-01  1960     1   1    
2   Bariko 29.1 1960-01-02  1960     1   2    
3   Bariko 26.4 1960-01-03  1960     1   3    
4   Bariko 29.0 1960-01-04  1960     1   4    
5   Bariko 22.0 1960-01-05  1960     1   5    
6   Bariko 25.9 1960-01-06  1960     1   6    
7   Bariko 24.2 1960-01-07  1960     1   7    
8   Bariko 23.9 1960-01-08  1960     1   8    
9   Bariko 24.4 1960-01-09  1960     1   9    
10  Bariko 24.0 1960-01-10  1960     1  10    
11  Bariko 24.2 1960-01-11  1960     1  11    
12  Bariko 24.8 1960-01-12  1960     1  12    
13  Bariko 25.4 1960-01-13  1960     1  13 

h <- count(data, 24, 4)     # I have the right number but for all 10 years(1960-1969)

#I split my data by year  to have the value for each year.
    g <- data$Year
    l <- split(data, g)
    k=l$'1962'
    h <- count(k, 24, 4)    # I repeat this 10 times (for each year)

My questions:

1.How can I detect days that correspond to my count?

2.How can I loop to get all the value in 2 columns (Year,Value)?

talat
  • 68,970
  • 21
  • 126
  • 157
NVega
  • 81
  • 9
  • Without sample data your problem is not reproducible. Please read [how to make a great reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) to make it easier for others to help you. – talat May 30 '14 at 12:45

1 Answers1

1

I'm surprised you have such a fancy R counting function but don't know how to apply a function across a list.

But first let's start with finding the days that correspond to a count. First i've updated the sample data to actually have more runs across more years. (to simplify there are only 2 months and they each only have 5 days)

data<-data.frame(
    Year=rep(1960:1969, each=10),
    Month=rep(1:2,2),
    Day=rep(1:5, each=2),
    build=24 + sin(1:100/4)*1.5
)

So rather than explicitly finding the days, i'm going to find the row index where runs start (and end) with these two functions.

findstart <- function(df, min_build, min_days) {
with(rle(df$build > min_build), 
    head(cumsum(c(1,lengths)),-1)[values & lengths >= min_days])} 

findrange <- function(df, min_build, min_days) {
with(rle(df$build > min_build), {m<-values & lengths >= min_days; 
    s<-head(cumsum(c(1,lengths)),-1); cbind(s[m], s[m]+lengths[m]-1)})}

They will work like count but will return either the start index, or the start/end as a matrix

(f <- findstart(data, 24, 4))
# [1]  1 26 51 76

(fr <- findrange(data, 24, 4))
#      [,1] [,2]
# [1,]    1   12
# [2,]   26   37
# [3,]   51   62
# [4,]   76   87

And then to apply your counting function to your list and get the data back that you want, you can do

g <- data$Year
l <- split(data, g)
data.frame(n=sapply(l, count, 24, 4))
#      n
# 1960 1
# 1961 0
# 1962 1
# 1963 1
# 1964 0
# 1965 1
# 1966 0
# 1967 1
# 1968 1
# 1969 0

Adding in the month breaks increase the number of runs because many cross month boundaries.

MrFlick
  • 195,160
  • 17
  • 277
  • 295