0

Lets say I have a data set (numpy array) X of N samples of time series each with T time steps of a D-dimensional vector so that:

X.shape == (N,T,D)

Now I want to reshape it into x (data set) and y (labels) to apply a machine learning to predict the step in the times series.

I want to take every subseries of each sample of length n

x.shape==(N*(T-n),n,D) and y.shape==(N*(T-n)),D)

with

X[k,j:j+n,:]

being one of my samples in x and

X[k,j+n+1,:] 

it's label in y.

Is a for-loop the only way to do that?

patapouf_ai
  • 17,605
  • 13
  • 92
  • 132

2 Answers2

0

So I have the following method, but it has a for loop, and I am not sure that I cannot do better:

    def reshape_data(self, X, n):
    """
    Reshape a data set of N time series samples of T time steps each
    Args:
        data: Time series data of shape (N,T,D)
        n: int, length of time window used to predict x[t+1]

    Returns:

    """
    N,T,D = X.shape

    x = np.zeros((N*(T-n),n,D))
    y = np.zeros((N*(T-n),D))

    for i in range(T-n):
        x[N*i:N*(i+1),:,:] = X[:,i:i+n,:]
        y[N*i:N*(i+1),:] = X[:,i+n,:]

    return x,y
patapouf_ai
  • 17,605
  • 13
  • 92
  • 132
0

you are looking for pandas data panel. (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Panel.html). just put into the numpy array, transpose on the minor axis and get its numpy representation (.as_matrix() or simply .values). if you want to truly do it only in numpy alone, numpy.transpose just for (https://docs.scipy.org/doc/numpy/reference/generated/numpy.transpose.html)

Asher11
  • 1,295
  • 2
  • 15
  • 31