-2

I am looking forward for a R solution that can check whether a word (in column 1) is present in a sentence (column 2) of a data frame or not. If the word is present in the sentence, than it should return 1 (TRUE) or else 0 (FALSE). This is how my DF looks and This is how it should look like

I would be highly thankful for any sort of help.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
Fraxxx
  • 114
  • 1
  • 11
  • Possible duplicate of [How to find that a word/words in a column is present in another column consisting a sentence](https://stackoverflow.com/questions/45461707/how-to-find-that-a-word-words-in-a-column-is-present-in-another-column-consistin) – Has QUIT--Anony-Mousse Aug 02 '17 at 22:51

2 Answers2

1

Use grepl():

df$t <- apply(df, 1, function(x) grepl(x[1], x[2]))
df
      substring                      string     t
1         phone this is my new mobile phone  TRUE
2        phones      Yes, I have two phones  TRUE
3 telephonessss            my old telephone FALSE
4  telephone234                   telephone FALSE

Note that this solution uses the apply() function in row mode. Conceptually we want to check whether each substring is contained in the string, for each row of the data frame.

Demo here:

Rextester

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Using `apply` on a dataframe is a mess. You could just do `with(df, mapply(grepl, substring, string))`. And this is a dupe btw. – David Arenburg Aug 02 '17 at 11:38
  • @DavidArenburg When you say "mess" are you worried about performance, the code, or both? – Tim Biegeleisen Aug 02 '17 at 11:49
  • All of these + side effects. `apply` is only good for matrices. It messes up and behaves unexpectedly for anything else (compare `apply(iris, 2, class)` with `str(iris)`, for instance). Not to mention performance and nasty code. – David Arenburg Aug 02 '17 at 11:54
  • Hi Tim. Thanks for the solution. But still I have a little problem as my substring is also a string (has more than one word), and due to that the apply() and grepl() is not working. For Instance, in Substring I have "my new phone" and in String I have "this is my new mobile phone". Then in that case I would get a false in t column. – Fraxxx Aug 02 '17 at 12:23
  • 1
    @Faraz I would recommend that you ask a new question. This new requirement is a large departure from your original requirements. – Tim Biegeleisen Aug 02 '17 at 12:46
0

You could use stri_detect_fixed from stringi package

So first I've created small data frame from two vectors

substring <- c("phone", "phones", "telephonesss")
string <- c("this is my new mobile phone", "Yes, I have two phones","my old telephone")
df <- data.frame(substring, string)

Then I've created new column in data frame named "t" containing values TRUE or FALSE

 df$t <- stri_detect_fixed(df$string, df$subatring)

And the output

> df
     substring                      string t
1        phone this is my new mobile phone  TRUE
2       phones      Yes, I have two phones  TRUE
3 telephonesss            my old telephone FALSE
Miha
  • 2,559
  • 2
  • 19
  • 34
  • Hi Miha. Thanks for the solution. But still I have a little problem as my substring is also a string (has more than one word), and due to that the stri_detect_fixed command is not working well. For Instance, in Substring I have "my new phone" and in String I have "this is my new mobile phone". Then in that case I would get a false in t column. – Fraxxx Aug 02 '17 at 12:17