0

I want to filter and group the data of my df according to the character in the comment_1 and comment_2 and give the indicator 0 or 1 as a final result. However, there are some rules that come together with the filter. The rules are:

  1. If the comment_1 of the row consists of apple and the comment_2 of the row consists of apple as well, then use price_1 minus price_2. If the number after subtraction is greater than 20, then the result will be 1, if less than 20, then the result will be 0

  2. If the comment_1 of the row consists of orange and comment_2 consists of apple / the comment_1 consist of apple and comment_2 consist of orange, then also use price_1 minus price_2. If the number after sbutraction is greater than 10, then the result wil be 1 otherwise the result will be 0.

Take note that the it doesn't matter is Apple or apple, Orange or orange, so the code should take capital letter into consideration as well.

For example:

  1. 1st row of the data is apple (comment_1) to Apple (comment_2), and the result of price_1 minus price_2 is 13 which is smaller than 20, hence the result will be shown as 0.

  2. 2nd row of the data is orange (comment_1) to Apple (comment_2) and the result is 11 after minus price_1 with price_2, since 11 is greater than 10, so the final result will be shown as 1.

  3. Since the 4th row price_1 - price_2 = 2 which is smaller than 10, so the result is 0.

I attached my df as below and the final column result is the final answer.

price_1 <- c(25, 33, 54, 24)
price_2 <- c(12, 22, 11, 22)
itemid <- c(22203, 44412,55364, 552115)
itembds <- as.integer(c("", 21344, "", ""))
comment_1 <- c("The apple is expensive", "The orange is sweet", "The Apple is nice", "the apple is not nice")
comment_2 <- c("23 The Apple was beside me", "The Apple was left behind", "The apple had rotten", "the Orange should be fine" )
result <- c(0, 1, 1, 0)

df <- data.frame(price_1, price_2, itemid, itembds, comment_1, comment_2, result)

enter image description here

halfer
  • 19,824
  • 17
  • 99
  • 186
Elvis
  • 405
  • 1
  • 4
  • 13

2 Answers2

0

This is a simple if-else statement. Here is a solution using dplyr and stringr.

library(dplyr)
library(stringr)

df %>% mutate(price_a = if_else(str_detect(comment_1, "[Aa]pple") & 
                              str_detect(comment_2, "[Aa]pple"), price_1 - price_2, 0),
              price_o = if_else(str_detect(comment_1, "[Oo]range") &
                                  str_detect(comment_2, "[Aa]pple"), price_1 - price_2, 0),
              price_o = if_else(str_detect(comment_1, "[Aa]pple") &
                                  str_detect(comment_2, "[Oo]range"), price_1 - price_2, price_o),
              res_actual = if_else(price_o > 10, 1, 0),
              res_actual = if_else(price_a > 20, 1, res_actual)) %>% 
  select(-price_o, -price_a)
price_1 price_2 itemid itembds              comment_1                  comment_2 result res_actual
1      25      12  22203      NA The apple is expensive 23 The Apple was beside me      0          0
2      33      22  44412   21344    The orange is sweet  The Apple was left behind      1          1
3      54      11  55364      NA      The Apple is nice       The apple had rotten      1          1
4      24      22 552115      NA  the apple is not nice  the Orange should be fine      0          0
knytt
  • 583
  • 5
  • 15
0

Using case_when with grepl to test for various conditions.

library(dplyr)

df %>%
    mutate(result = case_when(
             grepl('apple', comment_1, ignore.case = TRUE) &
             grepl('apple', comment_2, ignore.case = TRUE) ~ +(price_1 - price_2 > 20), 
             grepl('orange', comment_1, ignore.case = TRUE) &
             grepl('apple', comment_2, ignore.case = TRUE) |
             grepl('apple', comment_1, ignore.case = TRUE) &
             grepl('orange', comment_2, ignore.case = TRUE) ~ +(price_1 - price_2 > 10)))


#  price_1 price_2 itemid itembds              comment_1                  comment_2 result
#1      25      12  22203      NA The apple is expensive 23 The Apple was beside me      0
#2      33      22  44412   21344    The orange is sweet  The Apple was left behind      1
#3      54      11  55364      NA      The Apple is nice       The apple had rotten      1
#4      24      22 552115      NA  the apple is not nice  the Orange should be fine      0
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213