4

currently I'm trying to extract noun phrase from sentences. The sentences were stored in a column in excel file. Here the code using python:

import pandas as pd
import spacy

df = pd.read_excel("xxx.xlsx")

nlp = spacy.load("en_core_web_md")
for row in range(len(df)):
    doc = nlp(df.loc[row, "Title"])
    for np in doc.noun_chunks:
        print(np.text)

But I got this error:

Traceback (most recent call last):
  File "/Users/pusinov/PycharmProjects/textsummarizer/paper_term_extraction.py", line 10, in <module>
    doc = nlp(df.loc[row, "Title"])
  File "/Users/pusinov/PycharmProjects/textsummarizer/venv/lib/python3.9/site-packages/spacy/language.py", line 1002, in __call__
    doc = self._ensure_doc(text)
  File "/Users/pusinov/PycharmProjects/textsummarizer/venv/lib/python3.9/site-packages/spacy/language.py", line 1093, in _ensure_doc
    raise ValueError(Errors.E866.format(type=type(doc_like)))
ValueError: [E866] Expected a string or 'Doc' as input, but got: <class 'float'>.

Can anyone help me to make better code? Thank you very much.

p.s. I'm still newbie in python

  • always put full error message (starting at word "Traceback") in question (not in comments) as text (not screenshot, not link to external portal). There are other useful information. – furas Dec 17 '21 at 04:18
  • you didn't show full error message and we can't run it and we can't read in your mind - so we don't know which code/line makes problem. At this moment we can only suggest to use `print()`, `print(type())` to see want you have in variables in line which makes problem. It seems you get float values instead of strings. – furas Dec 17 '21 at 04:20
  • btw: `for index, row in df.iterrows():` – furas Dec 17 '21 at 04:21
  • Thank you. Already update the full error message. – researchcollege111 Dec 18 '21 at 05:38
  • error message shows problem with `doc = nlp(df.loc[row, "Title"])` but you don't have it in your code. But still you could use `print()`, `print(type())` to see what you have in `df.loc[row, "Title"]`. It seems you have float value instead of string. It may need to convert value to strings before uses in `nlp()` – furas Dec 18 '21 at 12:34
  • Thank you @furas you are right. I have to convert the value to strings (str). Problem solved. – researchcollege111 Dec 18 '21 at 15:52
  • Specifically: `doc = nlp(str(df.loc[row, "Title"]))` And I also updated my code on the question. – researchcollege111 Dec 18 '21 at 16:03

3 Answers3

3

Do null-value analysis. if you have any null values in your dataset, drop them.

Victor S
  • 31
  • 2
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Mar 25 '22 at 10:32
  • This actually helped, an empty row was the issue in my case. Thanks! – ᴍᴇʜᴏᴠ Oct 17 '22 at 17:20
2

I faced a similar issue and I fixed it using

df['Title']= df['Title'].astype(str)

The use of this code will fix the problem. As you have to convert all the data values to str format (usually it happens as comment might be number, or nan or null).

Suraj Rao
  • 29,388
  • 11
  • 94
  • 103
0

You might need to set the column type to string.

df['Title']= df['Title'].astype('string')

Uzzi Emuchay
  • 318
  • 4
  • 8