I'm newbie in ML and try to classify text into two categories. My dataset is made with Tokenizer from medical texts, it's unbalanced and there are 572 records for training and 471 for testing.
It's really hard for me to make model with diverse predict output, almost all values are same. I've tired using models from examples like this and to tweak parameters myself but output is always without sense
Here are tokenized and prepared data
Here is script: Gist
Sample model that I used
sequential_model = keras.Sequential([
layers.Dense(15, activation='tanh',input_dim=vocab_size),
layers.BatchNormalization(),
layers.Dense(8, activation='relu'),
layers.BatchNormalization(),
layers.Dense(1, activation='sigmoid')
])
sequential_model.summary()
sequential_model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['acc'])
train_history = sequential_model.fit(train_data,
train_labels,
epochs=15,
batch_size=16,
validation_data=(test_data, test_labels),
class_weight={1: 1, 0: 0.2},
verbose=1)
Unfortunately I can't share datasets. Also I've tired to use keras.utils.to_categorical with class labels but it didn't help