1

I need to calculate the max value contained between the beginning of the day and the moment when the min value happened. This is a toy example of my dataset for one day and one dendro:

             TIMESTAMP year DOY ring dendro diameter
1  2013-05-02 00:00:00 2013 122    1      1     3405
2  2013-05-02 00:15:00 2013 122    1      1     3317
3  2013-05-02 00:30:00 2013 122    1      1     3217
4  2013-05-02 00:45:00 2013 122    1      1     3026
5  2013-05-02 01:00:00 2013 122    1      1     4438
6  2013-05-03 00:00:00 2013 123    1      1     3444
7  2013-05-03 00:15:00 2013 123    1      1     3410
8  2013-05-03 00:30:30 2013 123    1      1     3168
9  2013-05-03 00:45:00 2013 123    1      1     3373
10 2013-05-02 00:00:00 2013 122    2      4     5590
11 2013-05-02 00:15:00 2013 122    2      4     5602
12 2013-05-02 00:30:00 2013 122    2      4     5515
13 2013-05-02 00:45:00 2013 122    2      4     4509
14 2013-05-02 01:00:00 2013 122    2      4     5566
15 2013-05-02 01:15:00 2013 122    2      4     6529

First, I calculated the MIN diameter for each day (DOY= day of the year) in each dendro (contained in one ring), also getting the time at what that min value happened:

library(plyr)
dailymin <- ddply(datamelt, .(year, DOY, ring, dendro),function(x)x[which.min(x$diameter), ])

Now, my problem is that I want to calculate the MAX diameter for each day. However, sometimes de max value occurs after the min value. I am only interested in the max value contained BEFORE the min value. I am not interested in the total max value if it happened after the min. Therefore, I need the max value contained (for each DAY) WITHIN THE TIME INTERVAL FROM THE BEGINNING OF THE DAY (00:00:00) TO THE THE MIN DIAMETER. Like I did with the min, I also need to know at what time that max value happened. This is what I want from the previous df:

  year DOY ring dendro             timeMin  min             timeMax  max
1 2013 122    1      1 2013-05-02 00:45:00 3026 2013-05-02 00:00:00 3405
2 2013 123    1      1 2013-05-03 00:30:00 3168 2013-05-03 00:00:00 3444
3 2013 122    2      4 2013-05-02 00:45:00 4509 2013-05-02 00:00:15 5602

As you can see, the min value is the actual min value. However, the max value I want is not the max value of the day, it is the max value that happened between the beginning of the day and the min value. My first attempt, unsuccessful, returns the max value of the day, even in it is out of the desired time interval:

    dailymax <- ddply(datamelt, .(year, DOY, ring, dendro),
function(x)x[which.max(x$diameter[1:which.min(datamelt$diameter)]), ]) 

Any ideas?

ekad
  • 14,436
  • 26
  • 44
  • 46
fede_luppi
  • 1,063
  • 4
  • 17
  • 29
  • 1
    I think your question could be clarified if you provide example data for three or four days showing the desired result. Perhaps just provide five rows of data for each day. – Mark Miller Sep 27 '13 at 08:37
  • I think a better toy example would leave out rownames and either drop the constant variables (year, DOY, ring, dendro) or create some variation (since we're supposed to be grouping by them). – Frank Sep 27 '13 at 13:20
  • Please provide the `dput` version of the data. – Metrics Sep 30 '13 at 23:22

1 Answers1

1

In a data.table, you could write:

DT[,{
  istar <- which.min(diameter)
  list(
    dmin=diameter[istar],
    prevmax=max(diameter[1:istar])
)},by='year,DOY,ring,dendro']

#    year DOY ring dendro dmin prevmax
# 1: 2013 242    6      8  470   477.2

I assume that a similar function can be written with your **ply

EDIT1: where DT comes from...

require(data.table)
DT <- data.table(header=TRUE, text='
date TIMESTAMP year DOY ring dendro diameter
1928419 2013-08-30 00:00:00 2013 242    6      8    471.5
1928420 2013-08-30 01:30:00 2013 242    6      8    477.2
1928421 2013-08-30 03:00:00 2013 242    6      8    474.7
1928422 2013-08-30 04:30:00 2013 242    6      8    470.0
1928423 2013-08-30 06:00:00 2013 242    6      8    475.6
1928424 2013-08-30 08:30:00 2013 242    6      8    478.7')

Your "TIMESTAMP" has a space in it, so I'm reading it as two columns, with the first called "date". Paste them together if you like. Next time, you can look into making a "reproducible example", as described here: How to make a great R reproducible example?

EDIT2: For the time of the max and min:

DT[,{
  istar   <- which.min(diameter)
  istar2  <- which.max(diameter[1:istar])
  list(
    dmin     = diameter[istar],
    tmin     = TIMESTAMP[istar],
    dmax     = diameter[istar2],
    tmax     = TIMESTAMP[istar2]
)},by='year,DOY,ring,dendro']

#    year DOY ring dendro dmin     tmin  dmax     tmax
# 1: 2013 242    6      8  470 04:30:00 477.2 01:30:00

As mentioned in EDIT1, I don't have both pieces of your TIMESTAMP variable in a single column because you did not provide them that way. To add more columns, just add new expressions in the list() above. The idea behind the code is that the {} expression is a code block where you can work with the variables in the chunk of data associated with each year,DOY,ring,dendro combination and return a list of new columns.

Community
  • 1
  • 1
Frank
  • 66,179
  • 8
  • 96
  • 180
  • I am not familiar with data.table. When I used your function I got this: `Error in `[.data.frame`(datamelt, , { : unused argument (by = "year,DOY,ring,dendro”)` In addition. How can I get the time of the max and min like in the desired example? – fede_luppi Sep 29 '13 at 23:19
  • @fede_luppi Oh, sorry, I've edited it to include an explanation of how to read in your data to DT; and how to add the time min and maxes as you mentioned in your question. I think that error will go away if you run `require(data.table)` and then `DT <- data.table(datamelt)`. I'm surprised no one has come by with a `plyr` answer yet... – Frank Sep 30 '13 at 00:27