I'm building an RNN and am having trouble passing in the data. The csv file I'm pulling data from has a sentence column, and a label column that's filled with a binary classification value (1 or 0). This is how I'm preprocessing right now:
data = pd.read_csv(r'/cybersecurity-sqlinjection/sqli.csv', encoding='utf-8')
vectorizer = TfidfVectorizer(norm = False, smooth_idf = False, analyzer='word', stop_words=stopwords.words('english'))
sentence_vectors = vectorizer.fit_transform((data['Sentence'].values.astype('U')))
df = pd.DataFrame(sentence_vectors.toarray())
X=df[df.columns]
y=data['Label']
X_train, X_test, y_train, y_test =train_test_split(X,y, train_size=0.8, test_size=0.2, random_state=42)
X.head()
Next I was passing in X_train to an LSTM model. At this point I was receiving errors about the shape of the data being passed in to the model, so I used the first response on this issue. I added this to the end of my code, before inputting the data into the model.
X_train_shape = X_train.shape #outputs (19327, 15016)
X_train = X_train.values.reshape(-1, 1, 15016)
model = keras.models.Sequential()
model.add(keras.layers.LSTM(15, input_shape=(1, 15016), return_sequences=True))
Now this error is being returned ValueError: Shapes (None, 1) and (None, 1, 10) are incompatible
I'm not sure what the issue is and would appreciate any help!