0

I have an array of dates and i would like to discard any dates that don't have at least one another date in a specific time interval, for example 5 minutes. I need to find a smart way to do it, as loops take forever with a larger dataset.

input data:

2009 07 07 16:01:30

2009 07 07 16:04:06

2009 07 07 16:05:00

2009 07 07 16:12:00

2009 07 07 16:19:43

2009 07 07 16:24:00

results:

2009 07 07 16:01:30

2009 07 07 16:04:06

2009 07 07 16:05:00

2009 07 07 16:19:43

2009 07 07 16:24:00

The value 2009 07 07 16:12:00 was discarded because it was more than 5 minutes away from any other timestamp.

Thanks, Cristi


Secondary issue:

Both Dan and nkjt suggested an implementation that worked, thanks! What if the dates are part of 2 groups: A or B and i want to find if there exist a date from group A that has a corresponding date in group B that is within a number of seconds/minutes apart? if not just remove the date from group A..

1 Answers1

1

You can use diff. You'll need to use datenum to convert your data into a vector of values. In MATLAB datenums, "1" is a single day, so you can define a datenum step in terms of a time unit divided by the number of those in a day:

s = num_mins/(24*60);

Here's the trick with diff:

x = datenum(mydata);
s = num_mins/(24*60);
% for increasing times we shouldn't need the `abs` but to be safe
d = abs(diff(x));
q = [d (s+1)]>s&[(s+1) d]>s;

(You can use datestr to convert back, or apply q to the original data)

How it works:

The output of diff is one shorter than the original - it's just the difference between neighbouring values. We need it to be directional - to check each value against the one that comes before and after.

[d (s+1)]>s makes a vector the same length as the original, and checks if the difference values are larger than s. Because we set the last value to be s+1, the final value will always return true. This is a check to whether there's a gap between a value and the one following it (so for the final value this is always true).

[(s+1) d]>s does the same but on the other side. Again, we are setting one value, this time the first, to be larger than s so it's always true.

Combining these gives us the points where the difference is more than five minutes on either side (or for the end points, on one side).

nkjt
  • 7,825
  • 9
  • 22
  • 28