1

I am new to R and new to working with Syuzhet.

I am trying to make a custom NRC-style library to use with the Syuzhet package in order to categorize words. Unfortunately, although this functionality now exists within Syuzhet, it doesnt seem to recognize my custom lexicon. Please excuse my weird variable names and the extra libraries, I plan to use them for other stuff later on and I am just testing things.

library(sentimentr)
library(pdftools)
library(tm)
library(readxl)
library(syuzhet)
library(tidytext)

texto <- "I am so love hate beautiful ugly"

text_cust <- get_tokens(texto)


custom_lexicon <- data.frame(lang = c("eng","eng","eng","eng"), word = c("love", "hate", "beautiful", "ugly"), sentiment = c("positive","positive","positive","positive"), value = c("1","1","1","1"))


my_custom_values <- get_nrc_sentiment(text_cust, lexicon = custom_lexicon)                             

I get the following error:

my_custom_values <- get_nrc_sentiment(text_cust, lexicon = custom_lexicon)
New names: • value -> value...4value -> value...5 Error in FUN(X[[i]], ...) : custom lexicon must have a 'word', a 'sentiment' and a 'value' column

As far as I can tell, my data frame exactly matches that of the standard NRC library, containing columns labeled 'word', 'sentiment', and 'value'. So I'm not sure why I am getting this error.

1 Answers1

0

The cran version of syuzhet's get_nrc_sentiment doesn't accept a lexicon. The get_sentiment does. But your custom_lexicon has an error. The values need to be integer values, not a character value. And to use your own lexicon, you need to set the method to "custom" otherwise the custom lexicon will be ignored. The code below works just with syuzhet.

library(syuzhet)

texto <- "I am so love hate beautiful ugly"

text_cust <- get_tokens(texto)
custom_lexicon <- data.frame(lang = c("eng","eng","eng","eng"), 
                             word = c("love", "hate", "beautiful", "ugly"), 
                             sentiment = c("positive","positive","positive","positive"), 
                             value = c(1,1,1,1))
get_sentiment(text_cust, method = "custom", lexicon = custom_lexicon)    

[1] 0 0 0 1 1 1 1
phiver
  • 23,048
  • 14
  • 44
  • 56
  • Thank you very much for your comment. I am using the GitHub version of Syuzhet, which to my understanding is supposed to allow custom lexicons for the get_nrc_sentiment function. My goal is to categorize words using custom identifiers like the emotion categories in NRC. That being said, I suppose I could assign an integer value to each category and deconvolute it after the fact. Thank you for pointing out my error! – Alexandra Hudson Aug 16 '22 at 13:42