1

I don't think this question has asked yet (most similar questions are about extracting data or returning a count). I am new to R, so any help would be appreciated!

I have a dataset of multiple runs of an experiment in one file and the data looks like this, where i have all the time steps for each run in rows time [info] id (unique per run)

I am attempting to calculate when the system reaches equilibrium, which I am defining as stable values in 3 interdependent parameters. I would like to have the contents of rows compared and if they are within 5% of each other over 20 timesteps, to return the timestep at which the stability begins and the id.

So far, I'm thinking it will be something like the following (or maybe have a while loop)(sorry for the bad formatting):

y=1;
z=0; #variables to control the loop
x=0;
for (ID) {
    if (CC at time=x == 0.05+-CC at time=y ) {

       if(z<=20){ #catalogs the number of periods that match
           y++ 
           z++}

      else [save value in column]

   }

else{ #no match for sustained period so start over again
     x++
     y=x+1
     z=0
   }
}

eta: CC is one of my parameters of interest and ranges between 0 and 1 although the endpoints are unlikely.

Here's a simple example that might help: this is something like how my data looks:

zz <- textConnection("time CC ID 
1          0.99       1
2          0.80       1
3          0.90       1
4          0.91       1
5          0.92       1
6          0.91       1
1          0.99       2
2          0.90       2
3          0.90       2
4          0.91       2
5          0.92       2
6          0.91       2")
Data <- read.table(zz, header = TRUE)
close(zz)

my question is, how can i run through the lines to find out when the value of CC becomes 'stable' (meaning it doesn't change by more than 0.05 over X (here, 3) time steps) so that it would create the following results:

    ID  timeToEQ
1   1   3
2   2   2

does this help? The only way I can think to do this is with a for-loop and I think there must be an easier way!

jean
  • 13
  • 1
  • 4
  • is there any chance the parameters are 0 (which would make being within 5% a rather tight requirement)? – Henry Dec 11 '11 at 19:57
  • there is a nonzero chance but it's unlikely. i've toyed with different definitions of the eq but until i can figure out how to calculate it, i'm stuck – jean Dec 11 '11 at 20:29
  • Maybe you could include a small reproducible example (http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and show us what you've got so far? – Roman Luštrik Dec 11 '11 at 20:47
  • i added more explanation, i hope that helps. thanks for the link--this is my first time posting on here. – jean Dec 11 '11 at 21:53
  • Have you looked at `rollapply` in the zoo package? – Ari B. Friedman Dec 11 '11 at 23:01
  • I hadn't heard of rollapply and am looking into it but I'm struggling with how to define the function. I'm going to keep searching. thank you! – jean Dec 12 '11 at 02:26

1 Answers1

2

Here is my code. I will post the explanation in some time.

require(plyr)
ddply(Data, .(ID), summarize, timeToEQ = Position(isTRUE, abs(diff(CC)) < 0.05 ))

  ID timeToEQ
1  1        3
2  2        2

EDIT. Here is how it works.

  1. ddply breaks Data into subsets based on ID.
  2. diff(CC) computes the difference between CC of successive rows.
  3. abs(diff(CC)) < 0.05) returns TRUE if the difference has stabilized.
  4. Position locates the first instance of an element which satisfies isTRUE.
Ramnath
  • 54,439
  • 16
  • 125
  • 152
  • thank you for responding! is there any way to keep comparing down the line, to make sure the data is stable for a given time? I think I could do this two ways, either by making the permitted difference impossibly small so that values must be incredibly stable, or figure out a way (rollapply?) to make sure that for X periods the difference remains small. – jean Dec 12 '11 at 02:30
  • you can do that by replacing `abs(diff(CC)) < 0.05` with an appropriate function that encapsulates your logic of `stability`. if you decide to go with an impossibly small difference, then you just have to alter 0.05 to an epsilon value you think is reasonable. hope this helps. – Ramnath Dec 12 '11 at 02:37
  • ok, i think i am getting close...can you help me figure out how to do this better, this is what I have but I'd like to do some loop or something to extend it. I tried writing this into a function but that completely failed. `ddply(Data, .(ID), summarize, timeToEQ = Position(isTRUE, (abs(diff(CC, lag=1)) < 0.05 & abs(diff(CC, lag=2)) < 0.05 & abs(diff(CC, lag=3)) < 0.05)))` – jean Dec 12 '11 at 03:42
  • can you describe in words, the logic of what you are trying to do? – Ramnath Dec 12 '11 at 04:13
  • what I'm trying to do is require the difference to be less than some value (say 0.05) for a sustained amount of time (perhaps three periods). in the sequence below, if the comparison is only to an immediate neighbor, then we would get the same results as above. if we require stability for three straight periods, then this section doesn't become stable. `0.99 0.80 0.90 0.91 0.92 0.97 0.92` – jean Dec 12 '11 at 13:04