4

I'd like to be able to write a cleaner way of doing the following:

I have a data.frame P (5000rows x 4cols) and would like to find the median values in columns 2,3 and 4 when the time-stamp in column 1 falls into a set range determined by a vector TimeStamp (in seconds).

dput(TimeStamp)
c(18, 138, 438, 678, 798, 1278, 1578, 1878, 2178)


dput(head(P))
structure(list(Time = c(0, 5, 100, 200, 500, 1200), SkinTemp = c(27.781, 
27.78, 27.779, 27.779, 27.778, 27.777), HeartRate = c(70, 70, 
70, 70, 70, 70), RespirationRate = c(10, 10, 10, 10, 10, 10)), .Names = c("Time", 
"SkinTemp", "HeartRate", "RespirationRate"), row.names = c(NA, 
6L), class = "data.frame")

e.g.

for x<i<y in P[,1]
     find median of all values in P[,2], P[,3] and P[,4]
     Put median values into a new matrix with headers SkinTemp, HeartRate and RespirationRate
end
Jaap
  • 81,064
  • 34
  • 182
  • 193
HCAI
  • 2,213
  • 8
  • 33
  • 65
  • 3
    Try `aggregate(P[,-1],list(Time=findInterval(P$Time,TimeStamp)),median)`. – nicola Jan 10 '17 at 08:54
  • Hi nicola, thank you for such a quick reply. What does the -1 mean in P[,-1]? – HCAI Jan 10 '17 at 08:56
  • 2
    It means that the `aggregate`-call does not include the first column of the input (in this case the old `Time`-variable). – LAP Jan 10 '17 at 08:59

2 Answers2

4

You can try:

aggregate(P[,-1],list(Time=findInterval(P$Time,TimeStamp)),m‌​edian)  
#  Time SkinTemp HeartRate RespirationRate
#1    0  27.7805        70              10
#2    1  27.7790        70              10
#3    2  27.7790        70              10
#4    3  27.7780        70              10
#5    5  27.7770        70              10

You want to divide the Time values according to the interval they fall into. There is an R function that does this: findInterval. So, we calculate the interval for each Time value and then aggregate the values of the other columns and calculate the median.

nicola
  • 24,005
  • 3
  • 35
  • 56
  • Thank you very much for this. I'd like to accept your answer as you were the first to comment for me. Is it possible to save the chunks of data that fit between the TimeSteps into a data.frame instead of just calculating the median? – HCAI Jan 10 '17 at 14:02
  • Glad you found this answer useful. I don't think I got what you are asking here. Keep in mind that different requests deserve different questions, so don't be afraid to open a new question. – nicola Jan 10 '17 at 14:16
  • I meant I'd like to pipe the data into a separate data.frame from which your aggregate function calculate the median. I.e. the raw data so I can plot it as separate chunks. E.g. all the data for SkinTemp between TimeStamp 2 and 3. – HCAI Jan 10 '17 at 14:23
2

Another option would be to use the cut function

P$new <- cut(P$Time, breaks = c(-Inf, TimeStamp, Inf))
aggregate(. ~ new, P, median)

#             new   Time SkinTemp HeartRate RespirationRate
#1      (-Inf,18]    2.5  27.7805        70              10
#2       (18,138]  100.0  27.7790        70              10
#3      (138,438]  200.0  27.7790        70              10
#4      (438,678]  500.0  27.7780        70              10
#5 (798,1.28e+03] 1200.0  27.7770        70              10
Sotos
  • 51,121
  • 6
  • 32
  • 66
  • Thank you for your answer. I can't quite work out why the answers this produces differs from the answer that @nicola gave. It varies by about 5% so it must be that one of the answers is not binning the data quite right... but I can't work out how – HCAI Jan 10 '17 at 14:10