-4

I have a numerical variable, call it "Blah". Blah is measured at various time intervals throughout the day and is an always increasing count. I want to find the difference between the first and last observation of Blah for each day and produce a table of the Total Amount of Increase of Blah per day.

Slightly more complicated is that if Blah is high enough, it will reset to a very low number. This always happens at the same (currently unknown) number and at maximum rate of once per day.

A few more details that might be important:

Blah is measured at different named locations as well. I would like a dataframe of day totals by location. :)

The time variable is in format "mm/dd/yyyy hh:mm:ss"

This is what I've come up with for a general outline. An issue I have is that I haven't worked with POSIXct objects much and don't know how to go about grabbing these values and making this happen.

A<-first value of Day
B<-last value of Day
C<-Maximum value of Blah from a day where reset happens (last value before reset)

For (each Day)
   For (each Location)

     If A < B 
        Then 
           DayTotal = B-A
        Else
            DayTotal = (C-A)+B

Edit:

I had some data here in the wrong format. The below is the correct format.


Thank you in advance for the help!

-Michael

Also, on a day where Blah resets, A will always be more than B.

EDIT NUMBER 2

OMG I am a terrible person. The data actually looks like this

   DESCRIPTION  rawCount   localDateTime
1   Arch Exit    33166  2014-05-23 07:55:05
2   Arch Exit    33167  2014-05-23 08:00:06
3   Arch Exit    33170  2014-05-23 08:10:06
4   Arch Exit    33173  2014-05-23 08:15:05
5   Arch Exit    33175  2014-05-23 08:20:05
6   Arch Exit    33178  2014-05-23 08:25:06
7   Northside    48073  2014-05-24 15:01:40
8   Northside    48119  2014-05-24 15:05:49
9   Northside    48167  2014-05-24 15:10:59
10  Northside    48237  2014-05-24 15:20:49
11  Northside       73  2014-05-24 15:25:59
12  Northside      350  2014-05-24 15:35:49
13  Northside     1430  2014-05-24 15:44:06
14  Northside     2554  2014-05-24 16:00:49

(supposing that the above data was complete per day) I would like the results to look like

DESCRIPTION  totalCount     Date
Arch Exit       12       2014-05-23
Northside      2718      2014-05-23

Another Edit

Ok so using the answer below I did the following which I think made it work.

rawDiff is an already existing variable (done in excel....yikes) and parse_date_time is a function from the Lubridridate package, "Full" is my data and "localdate" is the date variable I wanted.

blahblah<-with(Full, tapply(rawDiff, list(parse_date_time(Full$localDate, "mdy"),          DESCRIPTION), function(x) {
sum(x[x>=0])}))

There was some weirdness with NA's that using a separate pre-made difference variable seemed to help. Also, when it reset the differences were negative so I just took the non-negative differences.

Michael
  • 1,537
  • 6
  • 20
  • 42
  • 2
    You really should include some sample data and the desired output for that data. Read [how to make a great R reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for tips and suggestions. – MrFlick Jun 29 '14 at 17:21
  • ok I tried to edit with what you are requesting. Should have done this in the first place. My bad yo. – Michael Jun 29 '14 at 17:58
  • 3
    Dude. Stop changing your question and read the provided answer yo! It took me a few seconds to change the variable names in @MrFlick's original answer to reflect your new variables and get the answer you're expecting G. – A5C1D2H2I1M1N2O1R2T1 Jul 05 '14 at 16:29
  • dude bro! I did not change the question just what the data looked like. :) – Michael Jul 05 '14 at 16:59
  • @Michael, It seems like this question is still unanswered for you. Can you show where you're still stuck? – A5C1D2H2I1M1N2O1R2T1 Jul 10 '14 at 04:58
  • this question has been answered despite my inability to ask it in the appropriate fashion! – Michael Aug 01 '14 at 15:13
  • I believe this problem has been solved using the code included above. – Michael Sep 19 '14 at 18:33

2 Answers2

3

@MrFlick's answer can easily be adapted to fit your new data, but I'll share a variation to show that since you already have your logic defined, it's pretty easy to translate that almost verbatim.

We start with a simple function that looks at a vector.

myFun <- function(x) {
  A <- x[1]                    # What's the first value?
  B <- x[length(x)]            # What's the last value?
  if (B < A) {                 # If the last value is less than the first
    FLAG <- which(diff(x) < 0) # Identify where the value changes...
    C <- x[FLAG]               # ... and extract it
    C - A + B                  # Calculate according to your defined logic
  } else {                     # Otherwise, if things look straightforward
    B - A                      # Just calculate the difference
  }
}

Once you have that function, you can use one of the many "aggregation" functions available in R: tapply, by, or aggregate, for example. These aggregation functions would take care of the "for each day, for each location" part of your question logic.

Here's aggregate since it most closely matches your desired output:

aggregate(rawCount ~ DESCRIPTION + as.Date(localDateTime), mydf, myFun)
#   DESCRIPTION as.Date(localDateTime) rawCount
# 1   Arch Exit             2014-05-23       12
# 2   Northside             2014-05-24     2718

For this, I've used the following sample data:

mydf <- structure(list(
  DESCRIPTION = c("Arch Exit", "Arch Exit", "Arch Exit", "Arch Exit", 
                  "Arch Exit", "Arch Exit", "Northside", "Northside", 
                  "Northside", "Northside", "Northside", "Northside", 
                  "Northside", "Northside"), 
  rawCount = c(33166L, 33167L, 33170L, 33173L, 33175L, 33178L, 48073L, 
               48119L, 48167L, 48237L, 73L, 350L, 1430L, 2554L), 
  localDateTime = structure(c(1400831705, 1400832006, 1400832606, 
                              1400832905, 1400833205, 1400833506, 
                              1400943700, 1400943949, 1400944259, 
                              1400944849, 1400945159, 1400945749, 
                              1400946246, 1400947249), 
                            class = c("POSIXct", "POSIXt"), tzone = "GMT")), 
                  .Names = c("DESCRIPTION", "rawCount", "localDateTime"), 
                  row.names = c("1", "2", "3", "4", "5", "6", "7", "8", 
                                "9", "10", "11", "12", "13", "14"), 
                  class = "data.frame")
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • seriously, thanks for all the help. It is lovely that this site has so many people willing to help because searching forums can get pretty maddening. – Michael Jul 05 '14 at 17:09
  • @Michael, no problem. *Do* try to provide reproducible examples of your data though, like I did in my answer and like MrFlick did in his. It makes it easier for others to be able to copy-and-paste into their environments and check that the answer works as you would expect it to, making everyone's work easier. And I hope you didn't mind my dude and yo mimicry :-) – A5C1D2H2I1M1N2O1R2T1 Jul 05 '14 at 17:13
  • hahaha.....definitely not. I am not easily phased! I enjoy a little playful ribbing. Ah, by I see what you are saying about data being reproducible. I will do that in the future. – Michael Jul 05 '14 at 17:57
1

When asking for help like this, it's very useful to supply sample data and desired output. Since you didn't provide one, i'll use this (updated to match variable names in Edit 2)

#sample data
set.seed(15)
dd<-data.frame(
    DESCRIPTION=rep(letters[1:3], 9*5),
    rawCount=cumsum(rpois(3*5*9, 4)) %% 75,
    localDateTime=rep(seq(as.POSIXct("2001-01-01"), as.POSIXct("2001-01-03"), 
        by="6 hours"), each=5*3)
)

I'll also define a helper function that will drop times from POSIXct value by downcasting to a simple "Date" class

droptime<-as.Date

Then we can do

with(dd, tapply(rawCount, list(droptime(localDateTime), DESCRIPTION), function(x) {
    d <- diff(x)
    d[d<0] <- tail(x,-1)[d<0]
    sum(d)
}))

or to get the form in Edit 2

aggregate(rawCount~droptime(localDateTime)+DESCRIPTION, dd, FUN=function(x) {
    d <- diff(x)
    d[d<0] <- tail(x,-1)[d<0]
    sum(d)
})

What this does is for every location/date combination, it will calculate the range of values. I slightly rewrote your definition to look at pairwise differences and if the difference is ever negative, assume we've started over again at zero (this would allow for the case of the number resetting twice even though you way that will not happen). The tapply verison will return a matrix of the form

             a   b   c
2001-01-01 221 233 243
2001-01-02 230 232 219
2001-01-03  32  34  36

with the string version of the Date values as rownames and the locations as colnames or

  droptime(localDateTime) DESCRIPTION rawCount
1              2001-01-01           a      221
2              2001-01-02           a      230
3              2001-01-03           a       32
4              2001-01-01           b      233
5              2001-01-02           b      232
6              2001-01-03           b       34
7              2001-01-01           c      243
8              2001-01-02           c      219
9              2001-01-03           c       36

with the aggregate method (here, the class of Date is preserved).


To use with your updated sample data (Edit 1), you could use

sapply(xx[-1], function(x,g) {
    tapply(x, g, function(x) {
        d <- diff(x)
        d[d<0] <- tail(x,-1)[d<0]
        sum(d)
    })  
}, g=xx[[1]])

to get

  06/24/2014 06/25/2014
A          8         52
B          4         57
MrFlick
  • 195,160
  • 17
  • 277
  • 295