I am working on a way to split up data in a CSV file based on a timestamp.
For example, for a given object id, check each entries date and see if it is within a given, allowed range. So if a set of rows in the table were:
OBJECT ID - Info - Date
obj1 xyz 1/1/12
obj1 xyw 1/2/12
obj1 cya 1/3/12
obj1 abc 2/1/12
...
In this example, the fourth entry is well outside of the area of time that the other entries are in. Therefore, my desired behavior is for a script to assign that entry to a new object, say 'obj2' for example, such that it is separated from data within its own cluster. Note that the dataset this will be applied to will be somewhat large, at the very least in the 10s of thousands, so I don't know if manual algorithms will be fast enough.
I'm using R for the moment to try to get this done using the PAM and PAMK functions in the FPC package. This gives me a plot of the clusters (I think), but I don't know how to apply this information to the actual data.
Any thoughts or ideas on the best way to do this?