2

I am performing a nlp task. I have written the following code. While executing, it is showing the following error. Any suggestion to resolve the error will be helpful. I am having python 3 env in google colab .

# Pytextrank
import pytextrank
import json

# Sample text
sample_text = 'I Like Flipkart. He likes Amazone. she likes Snapdeal. Flipkart and amazone is on top of google search.'

# Create dictionary to feed into json file

file_dic = {"id" : 0,"text" : sample_text}
file_dic = json.dumps(file_dic)
loaded_file_dic = json.loads(file_dic)

# Create test.json and feed file_dic into it.
with open('test.json', 'w') as outfile:
json.dump(loaded_file_dic, outfile)

path_stage0 = "test.json"
path_stage1 = "o1.json"

# Extract keyword using pytextrank
with open(path_stage1, 'w') as f:
for graf in pytextrank.parse_doc(pytextrank.json_iter(path_stage0)):
f.write("%s\n" % pytextrank.pretty_print(graf._asdict()))

print(pytextrank.pretty_print(graf._asdict()))

I am getting the following error :

  AttributeError                            Traceback (most recent call last)      
  <ipython-input-33-286ce104df34> in <module>()      
       20 # Extract keyword using pytextrank      
       21 with open(path_stage1, 'w') as f:      
  ---> 22   for graf in 
  pytextrank.parse_doc(pytextrank.json_iter(path_stage0)):     
       23     f.write("%s\n" % pytextrank.pretty_print(graf._asdict()))       
       24     print(pytextrank.pretty_print(graf._asdict()))      

      AttributeError: module 'pytextrank' has no attribute 'parse_doc'   
louis_guitton
  • 5,105
  • 1
  • 31
  • 33
shan
  • 553
  • 2
  • 9
  • 25
  • What have you done to try to solve this? Do you have any leads? I recommend the follow article: https://ericlippert.com/2014/03/05/how-to-debug-small-programs/. – AMC Dec 14 '19 at 06:12
  • It seems that parse_doc attribute in pytextrank is not available. But I have seen people using this attribute.May be there is any update. I have tried to find out alternative of parse_doc which can resolve the error. – shan Dec 14 '19 at 06:33
  • Maybe you could share some examples of code where it is used? – AMC Dec 14 '19 at 06:47
  • What the heck are you doing with your JSON? You dump a dict to a string, then immediately load an identical dict from that string, then dump it *again* to a file, then read the file with pytextrank? – user2357112 Dec 14 '19 at 08:22
  • @AlexanderCécile As shown above, I am using it on sample_text. The code shown above may itself be used to identify and resolve the error. – shan Dec 14 '19 at 18:46
  • @user2357112supportsMonica I am extracting keyword using pytextrank. It will be helpful if you assist in resolving the error and getting the keyword – shan Dec 14 '19 at 18:50

3 Answers3

1

Implementation of TextRank in Python for use in spaCy pipelines

import spacy
import pytextrank
nlp = spacy.load('en_core_web_sm')
tr = pytextrank.TextRank()
nlp.add_pipe(tr.PipelineComponent, name='textrank', last=True)
# Sample text
sample_text = 'I Like Flipkart. He likes Amazone. she likes Snapdeal. Flipkart and amazone is on top of google search.'
#funct
for p in doc._.phrases:
    print(p.text)
Kum_R
  • 368
  • 2
  • 19
  • The code required for invoking PyTextRank had to change to support pipelines in spaCy 3.x and so what's shown above in this answer is no longer correct. FWIW, the new approach is simpler and based on a pipeline component factory. For sample code, see https://derwen.ai/docs/ptr/start/ – Paco Apr 20 '21 at 22:07
1

AttributeError: module 'pytextrank' has no attribute 'TextRank'

reproduce err:

run:

def summarize_text_returns_expected_summary(nlp, text):
    doc = process_text(nlp, text)
    if 'textrank' not in nlp.pipe_names:
        tr = pytextrank.TextRank()
        nlp.add_pipe(tr.PipelineComponent, name="textrank", last=True)
    doc = nlp(text)
    return [str(sent) for sent in doc._.textrank.summary(limit_phrases=15, limit_sentences=5)]

error:

AttributeError: module 'pytextrank' has no attribute 'TextRank'

fix:

step_1

check pytextrank installation

pip list | grep pytextrank

step_2

replace:

tr = pytextrank.TextRank()
nlp.add_pipe(tr.PipelineComponent, name="textrank", last=True)

with:

nlp.add_pipe("textrank")

updated code:

def summarize_text_returns_expected_summary(nlp, text):
    doc = process_text(nlp, text)
    if 'textrank' not in nlp.pipe_names:
        nlp.add_pipe("textrank")
    doc = nlp(text)
    return [str(sent) for sent in doc._.textrank.summary(limit_phrases=15, limit_sentences=5)]

omitting the if statement, risks encountering errors when accessing textrank: the script won't check if textrank is present in the pipeline.

why?

spacy pipeline: sequence of processing steps (tokenization, POS tagging, NER).

incorrect code manually uses pytextrank.TextRank(), then attempts to add it to the pipeline.

tr = pytextrank.TextRank()
nlp.add_pipe(tr.PipelineComponent, name="textrank", last=True)

correct code:

nlp.add_pipe("textrank")

auto adds textrank component correctly, ensuring proper registration and accessibility.

adding TextRank to the spacy pipeline registers its methods, attributes, and allows access via ._ on documents (e.g., doc._.textrank.summary()).

notes on module 'pytextrank' has no attribute 'parse_doc

a parser is often a necessary component in NLP pipeline.

it can be added to the pipeline alongside PyTextRank.

since:

error msg indicates that the parse_doc function is not found in the pytextrank module. potentially, due to changes in the pytextrank library: some functions might have been removed; or simply, do not exist.

do instead:

load a spacy parser, and add it to the pipeline along pytextrank.

i.e. the spacy small english model en_core_web_sm tokenizes the text before parsing it.

example:

import spacy
import pytextrank
import json

def get_top_ranked_phrases(text):
   nlp = spacy.load("en_core_web_sm")

   nlp.add_pipe("textrank")
   doc = nlp(text)

   top_phrases = []

   for phrase in doc._.phrases:
       top_phrases.append({
           "text": phrase.text,
           "rank": phrase.rank,
           "count": phrase.count,
           "chunks": phrase.chunks
       })

   return top_phrases

sample_text = 'I Like Flipkart. He likes Amazone. she likes Snapdeal. Flipkart and amazone is on top of google search.'

top_phrases = get_top_ranked_phrases(sample_text)

for phrase in top_phrases:
   print(phrase["text"], phrase["rank"], phrase["count"], phrase["chunks"])

output:

output_of_sample.py

code notes:

✔︎ load spacy small english model

✔︎ add pytextrank to pipeline

✔︎ store the top-ranked phrases

✔︎ examine the top-ranked phrases in the document

✔︎ print the top-ranked phrases

references:

-DerwenAI

-(https://spacy.io/universe/project/spacy-pytextrank)

-textrank: bringing order into text

-keywords and sentence extraction with textrank (pytextrank)

-模块'pytextrank'没有属性'parse_doc'

-scattertext/issues/92

-AttributeError: module 'pytextrank' has no attribute 'TextRank' #2

patme
  • 11
  • 2
0

There's a newer release of PyTextRank which simplifies the calling code, and makes these steps unnecessary: https://spacy.io/universe/project/spacy-pytextrank

Paco
  • 602
  • 1
  • 9
  • 19
  • `tr = pytextrank.TextRank()` gives `AttributeError: module 'pytextrank' has no attribute 'TextRank'`. Solution (https://github.com/DerwenAI/pytextrank): use `nlp.add_pipe("textrank")` in place of `tr = pytextrank.TextRank() ; nlp.add_pipe(tr.PipelineComponent, name='textrank', last=True)` – Victoria Stuart Apr 19 '21 at 18:48
  • Example code is in the PyTextRank docs at https://derwen.ai/docs/ptr/start/ – Paco Apr 20 '21 at 22:08