The main purpose of the LSTM is to utilize its memory property. Based on that what is the point of a stateless LSTM to exist? Don’t we “convert” it into a simple NN by doing that?
In other words.. Does the stateless use of LSTM aim to model the sequences (window) in the input data - if we apply shuffle = False in the fit layer in keras - (eg. for a window of 10 time steps capture any pattern between 10-character words)? If yes why don’t we convert the initial input data to match the form of the sequencers under inspection and then use a plain NN?
If we choose to have shuffle = True then we are losing any information that could be found in our data (e.g. time series data - sequences), don't we? In that case I would expect in to behave similarly to a plain NN and get the same results between the two by setting the same random seed.
Am I missing something in my thinking?
Thanks!