4

Having a dataframe with text

df = data.frame(id=c(1,2), text = c("My best friend John works and Google", "However he would like to work at Amazon as he likes to use python and stay at Canada")

Without any preprocessing

How is it possible to extract name entity recognition like this

Example results words

dfresults = data.frame(id=c(1,2), ner_words = c("John, Google", "Amazon, python, Canada")
Ken Benoit
  • 14,454
  • 27
  • 50
Nathalie
  • 1,228
  • 7
  • 20

1 Answers1

8

You can do this without quanteda, using the spacyr package -- a wrapper around the spaCy library mentioned in your linked article.

Here, I have slightly edited your input data.frame.

df <- data.frame(id = c(1, 2), 
                 text = c("My best friend John works at Google.", 
                          "However he would like to work at Amazon as he likes to use Python and stay in Canada."),
                 stringsAsFactors = FALSE)

Then:

library("spacyr")
library("dplyr")

# -- need to do these before the next function will work:
# spacy_install()
# spacy_download_langmodel(model = "en_core_web_lg")

spacy_initialize(model = "en_core_web_lg")
#> Found 'spacy_condaenv'. spacyr will use this environment
#> successfully initialized (spaCy Version: 2.0.10, language model: en_core_web_lg)
#> (python options: type = "condaenv", value = "spacy_condaenv")

txt <- df$text
names(txt) <- df$id

spacy_parse(txt, lemma = FALSE, entity = TRUE) %>%
    entity_extract() %>%
    group_by(doc_id) %>%
    summarize(ner_words = paste(entity, collapse = ", "))
#> # A tibble: 2 x 2
#>   doc_id ner_words             
#>   <chr>  <chr>                 
#> 1 1      John, Google          
#> 2 2      Amazon, Python, Canada
Ken Benoit
  • 14,454
  • 27
  • 50
  • 1
    If I receive error like this `Finding a python executable with spaCy installed... Error in set_spacy_python_option(python_executable, virtualenv, condaenv, : spaCy or language model en is not installed in any of python executables.` how can I resolve it? – Nathalie Aug 14 '19 at 20:09
  • See the spacyr installation instructions at https://spacyr.quanteda.io. It looks like you have not properly installed **spacyr**. – Ken Benoit Aug 14 '19 at 23:18