-1

I am currently trying to solve an if-else problem. I have one column in my data frame which contains numbers: the column is called "sentiment". I want to add a second column for which every row gets assigned a word: "positive", "negative", or "neutral". This column I call "evaluation". The criteria is that the column evaluation must hold the word "positive" if the sentiment column of that row contains a number above 0.25, "negative" if it holds a number below -0.25, and "neutral" otherwise. I tried running the following if-else construction:

here's the code

Subsequently, I would bind the evaluation vector with my existing data.frame, but that is not the issue, I know how to do that. But this statement generates "neutral" fine, but for every row that should be "positive" or "negative", an NA appears. I have no idea how to solve it. I am quite new to R, but am desperate for your help. "Invalid factor level, NA generated" is a warning I get. The problem does not seem to be in the fact that the column is not of a numeric type, because it is.

webprogrammer
  • 2,393
  • 3
  • 21
  • 27
D. M.
  • 1
  • 2
    Please take a look at [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), to modify your question, with a smaller sample taken from your data (check `?dput()`). Posting images of your data or no data makes it difficult to impossible for us to help you! – massisenergy Mar 01 '20 at 11:23
  • this means that instead of TRUE or FALSE, your if logic is returning NA. To get around this, either remove the NAs or add a check like `if (!is.na(x) & x > 0.25)` or whatever you had – morgan121 Mar 01 '20 at 11:32
  • Welcome to SO! Are you aware of R's vector functions or is there a specific requirement to use a `for` loop? Also, please note that growing a data object interatively is not recommended as this will become a performance issue when large objects are copied over and over again. – Uwe Mar 01 '20 at 19:09

2 Answers2

0

M.,

I think your problem is related to rbinding a character and a data.frame. Try this instead:

## I'm assuming reviews is a data.frame where column 9 is the sentiment data

sentiment = reviews[, 9]
evaluation = character()

for (i in 1:length(sentiment)) {
  if (sentiment[i] > .25) {
    evaluation[i] = 'positive'
  } else {
    if (sentiment[i] < -.25) {
      evaluation[i] = 'negative'
    } else {
      evaluation [i] = 'neutral'
    }
  }
}

reviews[['evaluation']] = evaluation
Coy
  • 329
  • 1
  • 7
0

If I understand correctly, the OP wants to add an additional column evaluation with three factor levels which depends on the numeric values in the sentiment column.

This can be achieved without using a for loop through R's vector functions.

Unfortunately, the OP has not provided a sample dataset so we need to create this by

df <- data.frame(sentiment = c(-0.5, -0.25, 0, 0.25, 0.5))

The cut() function can be used to convert numeric to factor. It divides the range of x into intervals and codes the values in x according to which interval they fall:

df$evaluation <- cut(df$sentiment, breaks = c(-Inf, -0.25, 0.25, Inf), 
                     labels = c("negative", "neutral", "postive"))
df
  sentiment evaluation
1     -0.50   negative
2     -0.25   negative
3      0.00    neutral
4      0.25    neutral
5      0.50    postive

cut() uses right closed intervals by default. So, the edge case -0.25 is mapped to "negative". which is not fully compliant with OP's requirement and OP's code sample.

Alternatively, nested ifelse() calls can be used instead of cut():

df$evaluation <- ifelse(df$sentiment < -0.25, "negative",
                        ifelse(df$sentiment > 0.25, "positive", "neutral"))
df
  sentiment evaluation
1     -0.50   negative
2     -0.25    neutral
3      0.00    neutral
4      0.25    neutral
5      0.50   positive

This is now fully compliant with OP's requirement and OP's code sample which requires that -0.25 and 0.25 are mapped symmetrically to "neutral".

For the sake of completeness, there is also the case_when() function from the dplyr package which can be used to avoided the nested ifelse() calls:

library(dplyr)
df %>% 
  mutate(evaluation = case_when(
    sentiment < -0.25 ~ "negative",
    sentiment >  0.25 ~ "positive",
    TRUE ~ "neutral"
  ))
  sentiment evaluation
1     -0.50   negative
2     -0.25    neutral
3      0.00    neutral
4      0.25    neutral
5      0.50   positive
Uwe
  • 41,420
  • 11
  • 90
  • 134