0

I'm trying to run sentiment analysis on a CSV file of tweets. After actually scoring all of the cells in the CSV file using the get_nrc_sentiment function, I run into "x must be numeric" errors, and I cannot figure out why. I'm entering the commands correctly and the actual sentiment assessment portion seems to function just fine. Once I get into more specific usages, though, I run into that error for both colSums and mutate_impl.

I have been following this tutorial here: https://rpubs.com/cosmopolitanvan/r_isis_tweets_analytics

Small note: I am very green at this. VERY green. As such, looking at other iterations of this problem has me rather confused...

In specific, here is what is going on:

Once I get to past the actual sentiment portion, I'm meant to graph the linguistic sentiments (anger, anticipation, positive, negative, etc.). To do so, I follow this:

alltweets$clean_text <- str_replace_all(alltweets$text, "@\\w+", "")

Sentiment <- get_nrc_sentiment(alltweets$clean_text)

alltweets_senti <- cbind(alltweets, Sentiment)

sentimentTotals <- data.frame(colSums(alltweets_senti[,c(11:18)]))

names(sentimentTotals) <- "count"

sentimentTotals <- cbind("sentiment" = rownames(sentimentTotals), sentimentTotals)

rownames(sentimentTotals) <- NULL

At the data.frame portion, I get a colSums error (x must be numeric). If I simply replace the section after colSums with Sentiment, the ggplot graph works just fine and looks *mostly like the tutorial (minus the numbers on the left presenting as 2E+05 and so on, which is whatever).

After that, I run this:

posnegtime <- alltweets_senti %>% 

group_by(created = cut(created, breaks="1 hour")) %>%

summarise(negative = mean(negative),
          positive = mean(positive)) %>% melt

Once more, I get the "x must be numeric" error, this time as a mutate_impl(.data, dots) evaluation error.

I don't know what will happen after that, since I cannot figure out what could be wrong with this. To a lot of you, I suspect this will seem super easy and no big deal, but boy is it throwing me for a loop!

Any advice/help on this would be greatly appreciated. I wish I could come with a bit more experience under my belt, but, well, let's just say I wasn't meant to do all of this on my own and now I am...

neilfws
  • 32,751
  • 5
  • 50
  • 63
  • Difficult to answer without seeing [some example data](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). But the error basically means what it says: you're trying to perform numerical operations on columns that contain something other than numbers. Start with `str(alltweets)` or `str(Sentiment)` and see what the variable type of each column is. – neilfws Apr 03 '19 at 23:00
  • Hi Neil. Thanks for the response. Meant to get to you sooner, but I was running it all again just to see if I messed something up. Nope. So I ran str for both. Sentiment is all numbers. It lists the categories as num with digits following. The other, however, is variable. The relevant string should be "text" or "clean_text." The first is listed as Factor w/ 582699 levels and then the tweets. The latter is listed as chr and then the tweets. – Shaun Duke Apr 03 '19 at 23:54
  • It seems like what is happening is the text or clean_text columns are definitely not numerical because they are all text. So the inputs for step one work fine with just Sentiment, but later the input wants to make sure individual sentiment assessments are associated with each individual line. And so it is reading the text lines as the thing being assessed rather than the thing doing the organizing. I just don't know how to fix that. Then again, I could be wrong. – Shaun Duke Apr 03 '19 at 23:57

0 Answers0