I am trying to create 3 tensors for my language translation LSTM network.
import numpy as np
Num_samples=50000
Time_step=100
Vocabulary=5000
shape = (Num_samples,Time_step,Vocabulary)
encoder_input_data = np.zeros(shape,dtype='float32')
decoder_input_data = np.zeros(shape,dtype='float32')
decoder_target_data = np.zeros(shape,dtype='float32')
Obviously, my machine doesn't have enough memory to do so. Since the data is represented as one-hot vectors, it seems using the function csc_matrix()
from scipy.sparse
will be the solution, as suggested in this tread and this tread.
But after trying the csc_matrix()
and crc_matrix()
, it seems they only support 2D array.
Old treads from 6 years ago did talk about this issue, but they are not machine learning orientated.
My question is: Is there any python lib/tool that can help me to create sparse 3D arrays that allows me to store one-hot vectors for machine learning purpose later?