0

I have this data frame tk which is a subset of my original data

tk

> ##    document   term count    sentiment
> ## 1       111 happen     1 anticipation
> ## 2       111   time     1 anticipation
> ## 3       112 mother     1 anticipation
> ## 4       112 mother     1          joy
> ## 5       112 mother     1     negative
> ## 6       112 mother     1     positive
> ## 7       112 mother     1      sadness
> ## 8       112 mother     1        trust
> ## 9       112    sue     1        anger
> ## 10      112    sue     1     negative
> ## 11      112    sue     1      sadness
> ## 12      112  wrong     1     negative
> ## 13      113   suck     1     negative
> ## 14      114   gate     1        trust

I need to

  • add a new column (tk$positive_negative) to contain values "positive" and "negative" only from the sentiment variable.
  • add another new column (tk$emotions) to contain any other value except "positive" and "negative" from also the sentiment variable.

I have tried for loop but i couldn't succeed

for (i in tk$sentiment){
  ifelse(i=="positive",tk$positive_negative<-"positive",ifelse(i=="negative",tk$positive_negative<-"negative",tk$emotions<-paste(print(i))))
}

> ## [1] "anticipation"
> ## [1] "anticipation"
> ## [1] "anticipation"
> ## [1] "joy"
> ## [1] "sadness"
> ## [1] "trust"
> ## [1] "anger"
> ## [1] "sadness"
> ## [1] "trust"

tk

> ##    document   term count    sentiment emotions positive_negative
> ## 1       111 happen     1 anticipation    trust          negative
> ## 2       111   time     1 anticipation    trust          negative
> ## 3       112 mother     1 anticipation    trust          negative
> ## 4       112 mother     1          joy    trust          negative
> ## 5       112 mother     1     negative    trust          negative
> ## 6       112 mother     1     positive    trust          negative
> ## 7       112 mother     1      sadness    trust          negative
> ## 8       112 mother     1        trust    trust          negative
> ## 9       112    sue     1        anger    trust          negative
> ## 10      112    sue     1     negative    trust          negative
> ## 11      112    sue     1      sadness    trust          negative
> ## 12      112  wrong     1     negative    trust          negative
> ## 13      113   suck     1     negative    trust          negative
> ## 14      114   gate     1        trust    trust          negative

Please advice, thank you

Sotos
  • 51,121
  • 6
  • 32
  • 66
  • 1
    `ifelse` is vectorised so no need for a loop. [Check this out](http://stackoverflow.com/questions/18012222/nested-ifelse-statement-in-r) for syntax. – Sotos Oct 13 '16 at 13:58
  • 1
    Exactly as @Sotos said. Try this: `tk$positive_negative <- ifelse(tk$sentiment %in% c("positive","negative"),tk$sentiment,"")` ; `tk$emotions <- ifelse(tk$sentiment %in% c("positive","negative"),"",tk$sentiment)` – Mike H. Oct 13 '16 at 14:00

1 Answers1

1

See the comment by @Sotos. ifelse is already vectorized which basically means it already applies the function to every element in the vector for you. So no need for a loop! Also, using vectorized functions is much faster than a non-vectorized approach.

With that said I think to solve your problem all you need to do is:

tk$positive_negative <- ifelse(tk$sentiment %in% c("positive","negative"),tk$sentiment,"")
tk$emotions <- ifelse(tk$sentiment %in% c("positive","negative"),"",tk$sentiment)

tk
   document   term count    sentiment positive_negative     emotions
1       111 happen     1 anticipation                   anticipation
2       111   time     1 anticipation                   anticipation
3       112 mother     1 anticipation                   anticipation
4       112 mother     1          joy                            joy
5       112 mother     1     negative          negative             
6       112 mother     1     positive          positive             
7       112 mother     1      sadness                        sadness
8       112 mother     1        trust                          trust
9       112    sue     1        anger                          anger
10      112    sue     1     negative          negative             
11      112    sue     1      sadness                        sadness
12      112  wrong     1     negative          negative             
13      113   suck     1     negative          negative             
14      114   gate     1        trust                          trust

Data:

    tk <- structure(list(document = c(111L, 111L, 112L, 112L, 112L, 112L, 
112L, 112L, 112L, 112L, 112L, 112L, 113L, 114L), term = structure(c(2L, 
6L, 3L, 3L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 7L, 4L, 1L), .Label = c("gate", 
"happen", "mother", "suck", "sue", "time", "wrong"), class = "factor"), 
    count = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L), sentiment = c("anticipation", "anticipation", "anticipation", 
    "joy", "negative", "positive", "sadness", "trust", "anger", 
    "negative", "sadness", "negative", "negative", "trust")), .Names = c("document", 
"term", "count", "sentiment"), row.names = c(NA, -14L), class = "data.frame")
Mike H.
  • 13,960
  • 2
  • 29
  • 39
  • Thank you for your solution guys, but when i tried your code it gives me numbers in the new columns rather than values, any suggestions why ? – Tarek Khedr Oct 13 '16 at 14:19
  • 1
    That's because your characters are being treated as factors. Before you run the code you need a `tk$sentiment <- as.character(tk$sentiment)` – Mike H. Oct 13 '16 at 14:22
  • Oh got it, thanks a lot for your support, appreciate – Tarek Khedr Oct 13 '16 at 14:24