31

I am currently trying to build a naive Bayes classifier as mentioned in this link. Referring to the line

X_new_tfidf = tfidf_transformer.transform(X_new_counts)

under the Training the Classifier subheading, I had a similar line, X_new_counts = count_vect.transform(input.plot_movie) in my code which should take an iterable as an input to the transform function. The input is a record from a DataFrame and is of type pd.Series and contains the following entries, out of which I send input.plot_movie as the input to the transform function:

enter image description here

However, I get the following error: Iterable over raw text documents expected, string object received

How do I fix this error? I also referred to this answer where the person says that s is an iterable because it was assigned a string. I also came across this link where a TypeError: 'String' object is not iterable is encountered. Am I missing something here? The links seem to contradict each other.

EDIT: I just realized that input.plot_movie is of type unicode and decided to convert it to a string. I encounter the same error again.

thegreatcoder
  • 2,173
  • 3
  • 19
  • 28
  • 1
    Have you actually read those links? The custom `class String` defined in that blog post is not the same thing as `str`. And it's all about how to modify `class String` so it _is_ iterable, the same way `str` already is. (And so is `unicode`.) – abarnert Apr 12 '18 at 22:55
  • But anyway, a string (`str` or `unicode`) is an iterable over characters, not an iterable over "raw text documents", whatever those are. Without actually seeing your code, it's very hard to guess what you're doing wrong, but my first guess would be something like this: The function wants a list of files or strings or some kind of objects returned by some function from that library, and you have a directory full of files that you could read those objects from, but instead of reading those files into a list, you're just passing the directory name. – abarnert Apr 12 '18 at 22:58
  • I am not passing a directory name anywhere. I am trying to pass a string/text as input to make a prediction, just like how in the first link, they have passed an array of strings to predict. – thegreatcoder Apr 12 '18 at 23:17
  • I said it was just a wild guess, because you haven't shown us your code or explained what you're passing. If you want us to not make wild guesses, please read [mcve] in the help and make this an answerable question. – abarnert Apr 12 '18 at 23:19
  • 1
    But meanwhile: what makes you think you can pass a single string to a function that expects an array of strings? That normally doesn't work—and when it does work, it normally treats your string as a list of single characters, which isn't very useful. Wherever you're passing the string in the code you haven't shown us, why not pass a one-element array with a string in it, following whatever example you're following? – abarnert Apr 12 '18 at 23:21
  • Right.I will try that. – thegreatcoder Apr 13 '18 at 02:02
  • It worked, thanks! FYI, I really thought this explanation was self- sufficient. I had posted the line where I had encountered an error and had also mentioned about the data type, which is where the problem arose. And, no files were actually read except to import the dataset. – thegreatcoder Apr 13 '18 at 03:11

2 Answers2

80

The cause of this problem is that input is just a string, but what is needed is a list (or an iterable) containing a single element.

The error can be removed by adding the following line:

input=[input]

before

X_new_counts = count_vect.transform(input.plot_movie)
thegreatcoder
  • 2,173
  • 3
  • 19
  • 28
0

Input should be in Square Brackets.

input = ["input"]

cv = CountVectorizer()

cv.fit(input)
Simas Joneliunas
  • 2,890
  • 20
  • 28
  • 35
xvinay28x
  • 1
  • 1