0

Referring to the example given in the keras docs here: https://github.com/fchollet/keras/blob/master/examples/imdb_bidirectional_lstm.py

I would like to use my own dataset instead of IMDB. After inspecting the format of the default dataset, i see that each word in the sentence is replaced by its vocabulary index, which is sorted in descending order.

I was looking through the keras docs here https://keras.io/preprocessing/text/ for a method that would accomplish this, none of them seem to work for me.

I have been trying the

Tokenizer.fit_on_texts and Tokenizer.fit_on_sequences methods.

Fit on texts returns a

AttributeError: 'float' object has no attribute 'lower'

error.

My input is a pandas series of text.

Could anyone point me as to what I'm doing wrong? I have looked at the following thread and it did not help

Keras - Text Classification - LSTM - How to input text?

Thank you!

Wboy
  • 2,452
  • 2
  • 24
  • 45

1 Answers1

2

Found the error, one of the texts was NaN, which causes Tokenizer to break. Leaving this here incase it helps anyone :)

Wboy
  • 2,452
  • 2
  • 24
  • 45