AttributeError: module 'pytextrank' has no attribute 'TextRank'
reproduce err:
run:
def summarize_text_returns_expected_summary(nlp, text):
doc = process_text(nlp, text)
if 'textrank' not in nlp.pipe_names:
tr = pytextrank.TextRank()
nlp.add_pipe(tr.PipelineComponent, name="textrank", last=True)
doc = nlp(text)
return [str(sent) for sent in doc._.textrank.summary(limit_phrases=15, limit_sentences=5)]
error:
AttributeError: module 'pytextrank' has no attribute 'TextRank'
fix:
step_1
check pytextrank
installation
pip list | grep pytextrank
step_2
replace:
tr = pytextrank.TextRank()
nlp.add_pipe(tr.PipelineComponent, name="textrank", last=True)
with:
nlp.add_pipe("textrank")
updated code:
def summarize_text_returns_expected_summary(nlp, text):
doc = process_text(nlp, text)
if 'textrank' not in nlp.pipe_names:
nlp.add_pipe("textrank")
doc = nlp(text)
return [str(sent) for sent in doc._.textrank.summary(limit_phrases=15, limit_sentences=5)]
omitting the if
statement, risks encountering errors when accessing textrank
: the script won't check if textrank
is present in the pipeline.
why?
spacy pipeline: sequence of processing steps (tokenization, POS tagging, NER).
incorrect code manually uses pytextrank.TextRank()
, then attempts to add it to the pipeline.
tr = pytextrank.TextRank()
nlp.add_pipe(tr.PipelineComponent, name="textrank", last=True)
correct code:
nlp.add_pipe("textrank")
auto adds textrank
component correctly, ensuring proper registration and accessibility.
adding TextRank
to the spacy pipeline registers its methods, attributes, and allows access via ._
on documents (e.g., doc._.textrank.summary()
).
notes on module 'pytextrank' has no attribute 'parse_doc
a parser is often a necessary component in NLP pipeline.
it can be added to the pipeline alongside PyTextRank.
since:
error msg indicates that the parse_doc
function is not found in the pytextrank
module. potentially, due to changes in the pytextrank library: some functions might have been removed; or simply, do not exist.
do instead:
load a spacy parser
, and add it to the pipeline along pytextrank
.
i.e. the spacy small english model en_core_web_sm
tokenizes the text before parsing it.
example:
import spacy
import pytextrank
import json
def get_top_ranked_phrases(text):
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("textrank")
doc = nlp(text)
top_phrases = []
for phrase in doc._.phrases:
top_phrases.append({
"text": phrase.text,
"rank": phrase.rank,
"count": phrase.count,
"chunks": phrase.chunks
})
return top_phrases
sample_text = 'I Like Flipkart. He likes Amazone. she likes Snapdeal. Flipkart and amazone is on top of google search.'
top_phrases = get_top_ranked_phrases(sample_text)
for phrase in top_phrases:
print(phrase["text"], phrase["rank"], phrase["count"], phrase["chunks"])
output:
output_of_sample.py
code notes:
✔︎ load spacy small english model
✔︎ add pytextrank to pipeline
✔︎ store the top-ranked phrases
✔︎ examine the top-ranked phrases in the document
✔︎ print the top-ranked phrases
references:
-DerwenAI
-(https://spacy.io/universe/project/spacy-pytextrank)
-textrank: bringing order into text
-keywords and sentence extraction with textrank (pytextrank)
-模块'pytextrank'没有属性'parse_doc'
-scattertext/issues/92
-AttributeError: module 'pytextrank' has no attribute 'TextRank' #2