Complex time-based subsetting in R

Question

I have a lot of temporal data (YYYY/MM/DD HH:MM:SS.SSS) stored at irregular thousands of seconds intervals. At each time period there are ten spatial measurements (X, Y, and Z values).

What I'd like to due is take a subset of the data, such as the first group of ten spatial measurements every half second (or some fraction of a second).

I'm fairly new to R so any help would be greatly appreciated!

Below is an example of 2 measurement times:

2012/09/21 14:59:07:712,A,0.036,0.224,0.814
2012/09/21 14:59:07:712,B,0.042,0.057,0.934
2012/09/21 14:59:07:712,C,-0.104,0.008,0.930
2012/09/21 14:59:07:712,D,0.158,0.001,0.914
2012/09/21 14:59:07:712,E,-0.208,-0.168,0.778
2012/09/21 14:59:07:712,F,-0.185,0.087,0.748
2012/09/21 14:59:07:712,G,-0.176,0.155,0.738
2012/09/21 14:59:07:712,H,0.236,-0.171,0.790
2012/09/21 14:59:07:712,I,0.244,0.076,0.732
2012/09/21 14:59:07:712,J,0.248,0.137, 0.722
2012/09/21 14:59:07:848,A,0.036,0.224,0.814
2012/09/21 14:59:07:848,B,0.042,0.057,0.934
2012/09/21 14:59:07:848,C,-0.104,0.008,0.930
2012/09/21 14:59:07:848,D,0.158,0.001,0.914
2012/09/21 14:59:07:848,E,-0.208,-0.168,0.778
2012/09/21 14:59:07:848,F,-0.185,0.087,0.748
2012/09/21 14:59:07:848,G,-0.176,0.155,0.738
2012/09/21 14:59:07:848,H,0.236,-0.171,0.790
2012/09/21 14:59:07:848,I,0.244,0.076,0.732
2012/09/21 14:59:07:848,J,0.248,0.137, 0.722

Welcome to Stack Overflow! Please give better sample data or reproducible example so that good people here can help you better. See http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example Also include what have you tried till now. This isn't place to get your work done for free. — CHP, Mar 20 '13 at 05:42

score 1 · Answer 1 · answered Mar 20 '13 at 07:15

It is not clear what do you want to do. You can start by reading your data. Since it is irregulat time series , and containing a factor variable(the group one), you can't use handy package like zoo or xts, since they need a matrix as structure. But you can use fread from data.table package:

library(data.table)
dat <- fread('2012/09/21 14:59:07:712,A,0.036,0.224,0.814
2012/09/21 14:59:07:712,B,0.042,0.057,0.934
2012/09/21 14:59:07:712,C,-0.104,0.008,0.930
2012/09/21 14:59:07:712,D,0.158,0.001,0.914
2012/09/21 14:59:07:712,E,-0.208,-0.168,0.778
2012/09/21 14:59:07:712,F,-0.185,0.087,0.748
2012/09/21 14:59:07:712,G,-0.176,0.155,0.738
2012/09/21 14:59:07:712,H,0.236,-0.171,0.790
2012/09/21 14:59:07:712,I,0.244,0.076,0.732
2012/09/21 14:59:07:712,J,0.248,0.137, 0.722
2012/09/21 14:59:07:848,A,0.036,0.224,0.814
2012/09/21 14:59:07:848,B,0.042,0.057,0.934
2012/09/21 14:59:07:848,C,-0.104,0.008,0.930
2012/09/21 14:59:07:848,D,0.158,0.001,0.914
2012/09/21 14:59:07:848,E,-0.208,-0.168,0.778
2012/09/21 14:59:07:848,F,-0.185,0.087,0.748
2012/09/21 14:59:07:848,G,-0.176,0.155,0.738
2012/09/21 14:59:07:848,H,0.236,-0.171,0.790
2012/09/21 14:59:07:848,I,0.244,0.076,0.732
2012/09/21 14:59:07:848,J,0.248,0.137, 0.722',header=FALSE)

Now you can play with your structure. For example To get the first 5 groups, you do this :

 dat[V2 %in% LETTERS[1:5],]
                         V1 V2     V3     V4    V5
 1: 2012/09/21 14:59:07:712  A  0.036  0.224 0.814
 2: 2012/09/21 14:59:07:712  B  0.042  0.057 0.934
 3: 2012/09/21 14:59:07:712  C -0.104  0.008 0.930
 4: 2012/09/21 14:59:07:712  D  0.158  0.001 0.914
 5: 2012/09/21 14:59:07:712  E -0.208 -0.168 0.778
 6: 2012/09/21 14:59:07:848  A  0.036  0.224 0.814
 7: 2012/09/21 14:59:07:848  B  0.042  0.057 0.934
 8: 2012/09/21 14:59:07:848  C -0.104  0.008 0.930
 9: 2012/09/21 14:59:07:848  D  0.158  0.001 0.914
10: 2012/09/21 14:59:07:848  E -0.208 -0.168 0.778

Hello again. Let me try to better explain what I'm trying to do. So I can load the data into R without any problem (I'm using Deducer). I can also convert the 1st column from character to time. What I'm having trouble doing is figuring out how to select a subset of the data based upon the time. For instance, supposed I'd like to make a subset of the first instance of each set of ten samples (i.e., A-J, the 10 rows all measured at the same time) every tenth of a second. Does that make more sense? — Concept Delta, Mar 21 '13 at 05:09
I'm guessing that I need to use some type of looping structure. Or perhaps Wickham's plyr package. I know the subset function lets you create the subset based upon specific values in a row, but I can't figure out how to adjust the conditional statement so that it can select the first occurrence within a moving time window (e.g., the first occurrence within a half second 'bin). — Concept Delta, Mar 21 '13 at 05:17

score 0 · Answer 2 · answered Oct 20 '13 at 01:06

Here's the solution that I came up with that was able to solve the problem (the only weakness was that it is stuck creating moving averages at 1 second intervals):

data_ID_P001 <- ddply(data_ID_P001, .(time_recorded, joint), summarise, average_x_pos = mean(x_pos), average_y_pos = mean(y_pos), average_z_pos = mean(z_pos))

Complex time-based subsetting in R

2 Answers2