2

How would I approach learning changes in speed using RNNs/LSTMs given x,y coordinates of continuous data? (I have to use a recurrent layer as this is a sub-problem of a bigger end-to-end model that does other things too)

Training data example:

x,y,speed_changed

0,0,0

0,0.1,0

0,0.2,0

0,0.3,0

0,0.5,1

0,0.6,0

0,0.7,0

...

So far I constructed stateful LSTM and train it on one item per batch. After, I reset the state of the LSTM every time there is a change in speed, so I learn that a segment had the same speed (segments can have different lengths).

How do I use such model in production then since the segments have different lengths?

Or is there a better way to train the recurrent net on such data? Perhaps an anomaly detection? (I want to avoid having a fixed batch size (e.g. window of 3 frames))

Boris Mocialov
  • 3,439
  • 2
  • 28
  • 55
  • This is a straightforward to do without any NN at all. Is there a reason you want to do it with a RNN in particular? Is this a real problem, or a learning exercise? LSTMs are usually used when you need to recall information that is temporally distant, so I personally see little point in using them here. Even if you want to use a NN (to effectively learn the pythagoras theorem...), learning would be greatly sped up if you can use derived features, such as the squares of x and y – loopbackbee Jan 30 '20 at 13:44
  • Also, why wouldn't you want to use a fixed window? It seems very suited to this problem, since you're looking for a change in instantaneous speed – loopbackbee Jan 30 '20 at 13:47
  • @goncalopp this feature should be a part of the multi-label model with every label focusing of different parts of the temporal data and doing either classification or prediction. Speed changes is a requirement for a commercial project. I know how to do it outside the network, but the project is all about the end-to-end learning – Boris Mocialov Jan 30 '20 at 15:17
  • 2
    @Boris Mocialov, if you know how to do it outside of the network, why wouldn't you simply add this additional, computed column to your data set and use this column as an input to the other parts of your model? As goncalopp wrote it doesn't make much sense to learn values that could be computed directly unless it's an example or an excersize. – isp-zax Feb 03 '20 at 04:30
  • @isp-zax the reason why it has to be in a network is the end-to-end learning – Boris Mocialov Feb 03 '20 at 10:40
  • the question would benefit from descrioption of the bigger 'end-to-end learining' project – Poe Dator Feb 04 '20 at 21:36

1 Answers1

1

The structure of RNNs and LSTMs will not let you do it directly, and this is the reason why - The activation function for an RNN is: h(t) = Tanh(W * h(t-1) + U * x(t) + Bias) Note that W, U and the Bias are all the same - no matter how many time frames you use for the RNN. So given some X vector, the output will be a function of p1*X1+p2*X2 and so forth, where X1 is X in your example and X2 is Y.

However - to detect a change in speed - you need a different calculation. A change in speed indicates that a different distance was traveled between time frames 1 and 2, and between time frames 2 and 3. The traveled distance is SQRT((X1(t)-X1(t-1))^2 + (X2(t)-X2(t-1))^2). This means that you need an activation function that takes into consideration X1*X1 in some way - and this is not possible within an RNN or LSTM.

However, you could achieve what you need indirectly, by using a custom activation function that calculates the distance passed on the latest time frame. Take a look at this link. By using your custom activation function, you can insert the vector of X1(t), X2(t), X1(t-1), X2(t-1) and calculate the distance D. At t=1 you may use 0's as X1(t=0) and X2(t=0).

Your custom activation function should look like D = (X1(t) - X1(t-1))^2 + (X2(t)-X2(t-1))^2. This way - if the speed is the same between time frames you will feed the RNN with constant D values, so you expect the RNN to achieve weights that will simulate a function of D(t) - D(t-1).

Roee Anuar
  • 3,071
  • 1
  • 19
  • 33