0

I have a sentence like this:

sent <- "She likes long walks on the beach with her dogs."

Let's say I tokenize the sentence by word. What NLP tools can I use to get data on the pronouns in this sentence, such as subject (first person, second person, third person) and type (possessive, reflexive, etc.)?

Christopher Costello
  • 1,186
  • 2
  • 16
  • 30
  • This is called part-of-speech tagging. It's a active research area. This may get you started: http://smart-statistics.com/part-speech-tagging-r/ – neilfws Apr 13 '18 at 04:45
  • I know what part of speech tagging is. But the link doesn't tell me how to tell anything about the pronouns besides the fact that they are pronouns. – Christopher Costello Apr 13 '18 at 04:55

1 Answers1

1

Short answer: you have to implement additional (appropriate) heuristics. For example a quick and dirty approach to detect the SUBJECT-VERB-OBJECT pattern is to search for NOUN-VERB-NOUN triples (or PRONOUN-VERB-NOUN), as recommended by Extract triplet subject, predicate, and object sentence -- not sure there is an advanced NLP package in R that does this reliably yet.

On your data, first create the POS tagging using http://smart-statistics.com/part-speech-tagging-r/ (any POS package is OK):

library(devtools)
devtools::install_github("bnosac/RDRPOSTagger")
library(RDRPOSTagger)
devtools::install_github("ropensci/tokenizers")
library(tokenizers)

Then create the tagging on your data:

sent <- "She likes long walks on the beach with her dogs."
unipostagger <- rdr_model(language = "English", annotation = "UniversalPOS")
pos <- rdr_pos(unipostagger, sent)

> pos
   doc_id token_id token   pos
1      d1        1   She  PRON
2      d1        2 likes  VERB
3      d1        3  long   ADJ
4      d1        4 walks  VERB
5      d1        5    on   ADP
6      d1        6   the   DET
7      d1        7 beach  NOUN
8      d1        8  with   ADP
9      d1        9   her  PRON
10     d1       10  dogs  NOUN
11     d1       11     . PUNCT

And then extract the pattern:

> subj <- pos %>% filter(grepl("PRON|NOUN",pos)) %>% select(token) %>% slice(1)
> verb <- pos %>% filter(grepl("VERB",pos)) %>% select(token) %>% slice(1)
> obj <- pos %>% filter(grepl("PRON|NOUN",pos)) %>% select(token) %>% slice(n())
> paste(subj, verb, obj)
[1] "She likes dogs"

Clearly the effectiveness will depend on the complexity of your sentences.

mysteRious
  • 4,102
  • 2
  • 16
  • 36