There are updates in the new online documentation for PyTextRank, and in particular see the "Getting Started" page at https://derwen.ai/docs/ptr/start/ for example code. Similar code is also shown in the sample.py
script in the GitHub repo.
BTW, the most recent release is 3.0.1, which is tracking the new spaCy
3.x updates.
Here's a simple usage:
import spacy
import pytextrank
# example text
text = "the ballistic nuclear threat can be thwarted by building a nuclear shield"
# load a spaCy model, depending on language, scale, etc.
nlp = spacy.load("en_core_web_sm")
# add PyTextRank to the spaCy pipeline
nlp.add_pipe("textrank", last=True)
doc = nlp(text)
# examine the top-ranked phrases in the document
for p in doc._.phrases:
print("{:.4f} {:5d} {}".format(p.rank, p.count, p.text))
print(p.chunks)
The output would be:
0.1712 1 a nuclear shield
[a nuclear shield]
0.1652 1 the ballistic nuclear threat
[the ballistic nuclear threat]
If you want to visualize the lemma graph in Graphviz
or other libraries which read the DOT
file format, you can add:
tr = doc._.textrank
tr.write_dot(path="graph.dot")
That will write output to a "graph.dot"
file. See the Graphviz
docs for examples of how to read and render.
FWIW, we are currently working on integration of the kglab
library, which will open up a much broader range of graph manipulation and visualization capabilities, since it integrates with may other libraries and file formats.
Also, if you have any suggestions or requests in terms of how you'd like to visualize results from PyTextRank, it's really helpful to create an issue at https://github.com/DerwenAI/pytextrank/issues and our developer community can help more there.
My apologies if I'm not interpreting correctly about "present the text as a graph", since another way to think about that would be to use the displaCy
dependency visualizer which shows a grammatical dependency graph of tokens in a sentence. There's an example given in the spaCy tuTorial.