I am using the function keywords_rake from the udpipe package (for R) to extract keywords from a bunch of documents.
udmodel_en <- udpipe_load_model(file = dl$file_model)
x <- udpipe_annotate(udmodel_en, x = data$text)
x <- as.data.frame(x)
keywords <- keywords_rake(x = x, term = "lemma", group = "doc_id",
relevant = x$xpos %in% c("NN", "JJ"), ngram_max = 2)
where data looks like this
Text
"cats are nice but dogs are better..."
"I really like dogs..."
"red flowers are pretty, especially roses..."
"once I saw a blue whale ..."
....
(each row is a separate document)
However the output does not include the origin of the keywords, and provides a list of keywords for all the documents
how can I link these keywords to the corresponding documents they were taken from? (I.e. have a list of keywords for each of the documents)
something like this:
keywords
doc1 dog, cat, blue whale
doc2 dog
doc3 red flower, tower, Donald Trump