Questions tagged [udpipe]

UDPipe comprises a free C++ library and a binary executable for Natural Language Processing (NLP).

UDPipe is a free C++ library for Natural Language Processing (NLP). UDPipe can do tokenization, parts-of-speech tagging, lemmatization and dependency parsing of raw text.

Binaries for Windows/Linux/OS X are also available, and there exist a web service and a REST API.

For details see http://ufal.mff.cuni.cz/udpipe and https://github.com/ufal/udpipe .

37 questions
3
votes
3 answers

Make udpipe_annotate() faster

I am currently working on a Text Mining document, where I want to abstract relevant keywords from my text (note that I have got many, many text documents). I am using the udpipe package. A great Vignette is online on…
R overflow
  • 1,292
  • 2
  • 17
  • 37
3
votes
1 answer

How to make "words clustering" in R with udpipe package?

I am using udpipe package in R to make some text mining. I have followed this tutorial : https://cran.r-project.org/web/packages/udpipe/vignettes/udpipe-usecase-postagging-lemmatisation.html#nouns__adjectives_used_in_same_sentence but now, I am a…
MysteryGuy
  • 1,091
  • 2
  • 18
  • 43
2
votes
1 answer

udpipe_annotate() in r labels the same word differently if followed by punctuation

I'm doing a standard topic modelling task on nouns in newspaper articles using udpipe to annotate the article content. Using the function udpipe_annotate() I noticed that words together with the following punctuation mark sometimes were labelled as…
Hal
  • 75
  • 6
2
votes
0 answers

NLP in R: working with tokenization in a CONLLU-style dataframe

I am working in a Portuguese Digital Humanities project using R. I created a CONLLU-style dataframe with the corpus data, using the UDPipe library: textAnnotated <- udpipe::udpipe_annotate(m_port, x = textCorpus) %>% as.data.frame() The beginning…
2
votes
1 answer

udpipe (keywords_rake) how to link keywords to the document they where extracted from

I am using the function keywords_rake from the udpipe package (for R) to extract keywords from a bunch of documents. udmodel_en <- udpipe_load_model(file = dl$file_model) x <- udpipe_annotate(udmodel_en, x = data$text) x <-…
Carbo
  • 906
  • 5
  • 23
1
vote
0 answers

How to run the R RAKE function in udpipe across individual groups

Given the following sample data frame: Question <- c("Q1", "Q1", "Q1","Q1","Q2", "Q2", "Q2","Q2") Answer <- c("I like to be creative when I cook with crock pots.","I like to be creative when I cook with crock pots.", "I like to be…
Mark P.
  • 1,827
  • 16
  • 37
1
vote
1 answer

R extract most common word(s) / ngrams in a column by group

I wish to extract main keywords from the column 'title', for each group (1st column). Desired result in column 'desired title': Reproducible data: myData <- structure(list(group = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3,…
Yeshyyy
  • 669
  • 6
  • 21
1
vote
2 answers

spacy-udpipe with pytextrank to extract keywords from non-English text

I've been using pytextrank (https://github.com/DerwenAI/pytextrank/) with spacy and English models for keywords extraction - it works great! Now I need to process non-English texts and I found udpipe (https://github.com/TakeLab/spacy-udpipe) but it…
1
vote
0 answers

Topic Modelling by Group using LDA in R

I am stuck at one problem. I am trying to categorize sentences into topics using LDA. I have done it, however the problem is: LDA is working on whole dataset and giving me topic terminologies across the dataset. I want to get the topic terminologies…
Rana Usman
  • 1,031
  • 7
  • 21
1
vote
1 answer

How to get future tense for a verb with udpipe

I have a large number of medical reports. I am trying to determine sentences that show a future action will be taken eg 'I will prescribe a medication' I am using english-ewt model from udpipe and I have also tried english-gum but neither give me a…
Sebastian Zeki
  • 6,690
  • 11
  • 60
  • 125
1
vote
1 answer

R - Parsing keywords from udpipe RAKE per article back to dataframe

I'm attempting to use udpipe's RAKE to generate a list of 25 RAKE tokens per document in a dataframe and write those tokens (plus a simple str_count) back to the dataframe. I constructed a for loop to handle, but instead I'm writing the same result…
Christopher Penn
  • 539
  • 4
  • 14
1
vote
0 answers

Text Mining responses with very varying answer lengths

I have a dataset of responses where people were requested to answer a set of questions. There's only one column of text data to process. My challenge is; only very few respondents have actually written long texts that I found easy to process and…
Dinesh
  • 654
  • 2
  • 9
  • 31
1
vote
1 answer

inherits(x, "character") is not TRUE in R programming Shiny App

I am creating Shiny App and the purpose is to input text file and using udpipe library need to create wordcloud, annoate etc... I am getting "inherits(x, "character") is not TRUE" when running the app. The problem comes from "Annotate" Tab as i am…
1
vote
1 answer

Is it possible to modify spaCy by udpipe within the Rasa-NLU?

I am several days testing Rasa-NLU, which internally uses spaCy. I had a great disappointment about the Portuguese language. Trying to figure out how to improve the training data, I found an excellent script comparing spaCy with udpipe that can be…
luisdemarchi
  • 1,402
  • 19
  • 29
1
vote
2 answers

Find words in a corpus based on lemma

I am doing text mining with R and I get an "issue" I would like to solve... In order to find the reports in corpus that contain the most a given word or expression, I use kwicfunction from quantedapackage like this : result <- kwic…
MysteryGuy
  • 1,091
  • 2
  • 18
  • 43
1
2 3