0

Every idea or suggestion would be appreciated! I have several "the same style" numpy objects(u1,u2,u3...) each of them is :

Object 1:

   [[Timestamp('2004-02-28 00:59:16'), 19.9884],
   [Timestamp('2004-02-28 01:03:16'), 19.3024],
   ...
   [Timestamp('2004-02-28 01:06:16'), 19.1652]]

Object 2:

   [[Timestamp('2004-02-28 01:08:17'), 19.567],
   [Timestamp('2004-02-28 01:10:16'), 19.5376],
    ...
   [Timestamp('2004-02-28 01:26:47'), 19.4788]]

I would like to find which of the these objects has the same "trends"in the time series by clustering them. I tried several ways including:

from sklearn.neighbors import NearestNeighbors
X = np.array([u1, u2, u3])
nbrs = NearestNeighbors(n_neighbors=2, algorithm='ball_tree').fit(X)
distances, indices = nbrs.kneighbors(X)
print(distances)

Some of my errors:

TypeError: float() argument must be a string or a number, not 'Timestamp'

ValueError: setting an array element with a sequence.

TypeError: only size-1 arrays can be converted to Python scalars

Conclusion

Can someone atleast give me a suggestion what should I do. Thanks!

1 Answers1

0

(1) Your first error means that Timestamp must be converted into a string or a number. Just convert them to numbers by .value, which means nanoseconds since Unix epoch time (1970-01-01). Operation in lists:

u1 = list(map(lambda el: (el[0].value / 1e9, el[1]), u1))
u2 = list(map(lambda el: (el[0].value / 1e9, el[1]), u2))
...

(2) np.array([u1, u2, u3]) produces a 3D array instead of the usually expected 2D. This may be the cause of the second error (expected a number but got a sequence instead because of a redundant dimension). Replace this by one of the following:

X = np.array(u1 + u2 + ...)  # for lists
X = pd.concat([u1, u2, ...], axis=0)  # for dataframes

The revised code can run. Output using your sample data:

[[  0.         240.00098041]
 [  0.         180.00005229]
 [  0.         121.00066712]
 [  0.         119.00000363]
 [  0.         119.00000363]
 [  0.         991.00000174]]
Bill Huang
  • 4,491
  • 2
  • 13
  • 31
  • Thanks, that solved my errors, I took different approach, resampled my data, toook only the temperature , I made lists of the type [19.4333, 19.23111 ...](removing the timestamp) for each object. Then did again the numpy obj -> compute the knn the same way. This gave me a matrix for the similarity between the different objects- > that was what I was looking for.Thanks! Look at: https://stackoverflow.com/questions/58358110/clustering-similar-time-series –  Oct 19 '20 at 11:18