I just prepared text data using the Keras Tokenizer
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
VOCAB_SIZE= 10000
tokenizer = Tokenizer(num_words = VOCAB_SIZE)
tokenizer.fit_on_texts(X_train)
X_train_seq = tokenizer.texts_to_sequences(X_train)
X_test_seq = tokenizer.texts_to_sequences(X_test)
Knowing all should become the same length vectors to fit into a neural network. How should I use the pad_sequences
function from Keras to do this? Would this be (not sure about the maxlen):
X_train_seq _padded = pad_sequences(X_train_seq, maxlen = VOCAB_SIZE)
X_test_seq _padded = pad_sequences(X_test_seq, maxlen = VOCAB_SIZE)