I recently tried to visualize TextRank using code, but I realized that the terms in the graph are not lemmatized. Is there a way to fix the following code so that all words in textrank_df['parse'] are lemmatized? I checked the pipeline components and all required components are in place ('tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner'), so I'm really not sure where went wrong.
import pytextrank
import spacy
import scattertext as st
nlp = spacy.load('en_core_web_sm')
nlp.add_pipe("textrank", last=True)
convention_df = textrank_df.assign(
parse=lambda textrank_df: textrank_df['Combined'].apply(nlp),
)
corpus = st.CorpusFromParsedDocuments(
convention_df,
category_col='Response Variable',
parsed_col='parse',
feats_from_spacy_doc=st.PyTextRankPhrases()).build()
I tried the following code1, but it shows: AttributeError: module 'pytextrank' has no attribute 'TextRank'. I think it might be something to do with the format after this alteration.
code 1
convention_df = textrank_df.assign( parse=lambda textrank_df: textrank_df['Combined'].apply(lambda x: [token.lemma_ for token in nlp(x)]))
I also tried code 2 which adds use_lemmas=True in PyTextRankPhrases() but did not work as well. The word is still presented in its original form.
code 2
corpus = st.CorpusFromParsedDocuments( convention_df, category_col='Response Variable', parsed_col='parse', feats_from_spacy_doc=st.PyTextRankPhrases(use_lemmas=True)).build()