0

I have recently started training myself into sentiment analysis.I have a dataset that looks like this:initial data

The original data consisted of reviews of wines, one per each row. What I have done is tokenize it and performed basic sentiment analysis with one of the R lexicons. As can bee seen in the screentshot. Column X refers to the original row in the initial data frame. What I want to do now is calculate the net effect( to see which is the prevailing for each row- positive or negative, however in numbers for each original row(X) and attach it as a column). I have tried with the following code but it does not work:

per_row <- unigrams_all_ns %>%
inner_join(get_sentiments("bing"),by=c("unigram"="word"))%>%
group_by(X)%>%
spread(sentiment, n, fill = 0)

I get the following error

Error: var must evaluate to a single number or a column name, not a function

Pesho
  • 5
  • 2
  • 1
    Please show a small reproducible example and expected output – akrun Feb 09 '18 at 09:48
  • 1
    Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – Sotos Feb 09 '18 at 09:50
  • I think you shall first summarise and then spread `group_by(X, sentiment) %>% summarise(n = n()) %>% spread(sentiment, n, fill = 0)` – Volodymyr Feb 09 '18 at 09:53
  • I am a bit confused with your statement. Do you mean you want to know how many positive and negative words exist for each level of X? If so, `count(X, sentiment)` would follow `inner_join()` in your code. If you wanna have a wide-format data, you would use `spread(key = sentiment, value = n, fill = 0)` after the `count()`. – jazzurro Feb 09 '18 at 10:04
  • @Vova Thanks that worked like a charm. – Pesho Feb 09 '18 at 10:07
  • @jazzurro yes I want to know how many positives and how many negatives are for a level of x. And afterwards I will calculate the net - either positive if #possitves>#negatives and so on. – Pesho Feb 09 '18 at 10:08
  • @Pesho OK. Got it. – jazzurro Feb 09 '18 at 10:14

1 Answers1

1

What you want to do is to count how many positive and negative words exist for each group in X. You can use count() in the dplyr package. It seems that you want to have a wide-format data given what you tried to do. So I used spread(). I think you can do more from here by yourself.

library(dplyr)
library(tidyr)
library(tidytext)

unigrams_all_ns <- data.frame(X = c(1,2,2,2,2,3,3,3,4,4),
                              unigram = c("smooth", "snappy", "dominate", "crisp", "stainless", "lemon", 
                                          "blossom", "opulent", "rough", "pleasantly"),
                              stringsAsFactors = FALSE)

unigrams_all_ns %>%
inner_join(get_sentiments("bing"), by =c("unigram" = "word"))%>%
count(X, sentiment) %>%
spread(key = sentiment, value = n, fill = 0)

      X negative positive
  <dbl>    <dbl>    <dbl>
1  1.00     0        1.00
2  2.00     0        4.00
3  3.00     1.00     2.00
4  4.00     1.00     1.00
jazzurro
  • 23,179
  • 35
  • 66
  • 76