0

I have the following dataframe which contains reviews that customer have left on a restaurant website:

id<-c(1,2,3,4,5,6)
review<- c("the food was very delicious and hearty - perfect to warm up during a freezing winters day", "Excellent service as usual","Love this place!", "Service and quality of food first class"," Customer services was exceptional by all staff","excellent services")
df<-data.frame(id, review) 

Now I am looking for a way (preferably without using a for loop) to find the part-of-speech labels in each customer's review in R.

Community
  • 1
  • 1
AliCivil
  • 2,003
  • 6
  • 28
  • 43

3 Answers3

3

Considerig in your example the id column is simply the row index, I believe you can obtain your desired output with the pos() function from the qdap package.

library(qdap)
pos(df$review)

If you do need grouping because of multiple reviews per customer, you can use

pos_by(df$review,df$id)
mtoto
  • 23,919
  • 4
  • 58
  • 71
3

This is a pretty straightforward adaption of the example on the Maxent_POS_Tag_Annotator help page.

df<-data.frame(id, review, stringsAsFactors=FALSE) 

library(NLP)
library(openNLP)

review.pos <- 
  sapply(df$review, function(ii) {
    a2 <- Annotation(1L, "sentence", 1L, nchar(ii))
    a2 <- annotate(ii, Maxent_Word_Token_Annotator(), a2)
    a3 <- annotate(ii, Maxent_POS_Tag_Annotator(), a2)
    a3w <- subset(a3, type == "word")
    tags <- sapply(a3w$features, `[[`, "POS")
    sprintf("%s/%s", as.String(ii)[a3w], tags)
  })

Which results in this output:

#[[1]]
# [1] "the/DT"       "food/NN"      "was/VBD"      "very/RB"      "delicious/JJ"
# [6] "and/CC"       "hearty/NN"    "-/:"          "perfect/JJ"   "to/TO"       
#[11] "warm/VB"      "up/RP"        "during/IN"    "a/DT"         "freezing/JJ" 
#[16] "winters/NNS"  "day/NN"      
#
#[[2]]
#[1] "Excellent/JJ" "service/NN"   "as/IN"        "usual/JJ"    
#
#[[3]]
#[1] "Love/VB"  "this/DT"  "place/NN" "!/."     
#
#[[4]]
#[1] "Service/NNP" "and/CC"      "quality/NN"  "of/IN"       "food/NN"    
#[6] "first/JJ"    "class/NN"   
#
#[[5]]
#[1] "Customer/NN"    "services/NNS"   "was/VBD"        "exceptional/JJ"
#[5] "by/IN"          "all/DT"         "staff/NN"      
#
#[[6]]
#[1] "excellent/JJ" "services/NNS"

It should be relatively straightforward to adapt this to whatever format you want.

Jota
  • 17,281
  • 7
  • 63
  • 93
2

If you don't mind trying a GitHub package I have the tagger package to wrap NLP/openNLP to do a number of tasks quickly in the way Python users manipulate pos tags. Note that the output prints in the traditional word/tag format but in reality the object is actually a list of named vectors. This makes working with the words and tags easier. Here I demo how to get the tags and a few manipulations that tagger makes easy:

# First load your data and get the tagger package for those playing along at home

id<-c(1,2,3,4,5,6)
review<- c("the food was very delicious and hearty - perfect to warm up during a freezing winters day", "Excellent service as usual","Love this place!", "Service and quality of food first class"," Customer services was exceptional by all staff","excellent services")
df<-data.frame(id, review)  

if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh("trinker/tagger")

# Now tag and manipulate

(out <- tag_pos(as.character(df[["review"]])))

## [1] "the/DT food/NN was/VBD very/RB delicious/JJ and/CC hearty/NN -/: perfect/JJ to/TO warm/VB up/RP during/IN a/DT freezing/JJ winters/NNS day/NN"
## [2] "Excellent/JJ service/NN as/IN usual/JJ"                                                                                                       
## [3] "Love/VB this/DT place/NN !/."                                                                                                                 
## [4] "Service/NNP and/CC quality/NN of/IN food/NN first/JJ class/NN"                                                                                
## [5] "Customer/NN services/NNS was/VBD exceptional/JJ by/IN all/DT staff/NN"                                                                        
## [6] "excellent/JJ services/NNS"  


c(out)                         ## True structure: list of named vectors
as_word_tag(out)               ## Match the print method (less mutable)
count_tags(out, df[["id"]])    ## Get counts by row
plot(out)                      ## tag distribution (plot at end)

as_basic(out)                  ## basic pos tags

## [1] "the/article food/noun was/verb very/adverb delicious/adjective and/conjunction hearty/noun -/. perfect/adjective to/preposition warm/verb up/preposition during/preposition a/article freezing/adjective winters/noun day/noun"
## [2] "Excellent/adjective service/noun as/preposition usual/adjective"                                                                                                                                                               
## [3] "Love/verb this/adjective place/noun !/."                                                                                                                                                                                       
## [4] "Service/noun and/conjunction quality/noun of/preposition food/noun first/adjective class/noun"                                                                                                                                 
## [5] "Customer/noun services/noun was/verb exceptional/adjective by/preposition all/adjective staff/noun"                                                                                                                            
## [6] "excellent/adjective services/noun"          


select_tags(out, c("NN", "NNP", "NNPS", "NNS"))

## [1] "food/NN hearty/NN winters/NNS day/NN"   
## [2] "service/NN"                             
## [3] "place/NN"                               
## [4] "Service/NNP quality/NN food/NN class/NN"
## [5] "Customer/NN services/NNS staff/NN"      
## [6] "services/NNS"

enter image description here

Everything works pretty nicely within a magrittr pipeline as well, which is my preference. The Examples Section of the README has a nice overview of the package's usage.

Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519