Detecting a word in a string in R

Question

I have the BigQuery Dataset with Reddit Comments. It has multiple columns, one which is the body column with the actual comment. I now want to search for a certain word, like a brand mention, for instance "BMW" in the body column and create a subset of the rows which contain "BMW" in data$body.

The dataset looks similar to this:

str(data)
data.frame: 75519 obs. of 113 variables
$ body: chr "...." .....
$ name: Factor w/ 22805 levels ....
....

I know the SQL command, which looks like this

SELECT * FROM dataset
WHERE body contains "BMW"

Is there a similar command in R?

Thank you very much!

EDIT: Solutions is

 bmw <- data[grep("BMW", data$body),]

Thanks to charleslmh

Possible duplicate of [Test if characters in string in R](http://stackoverflow.com/questions/10128617/test-if-characters-in-string-in-r) — James Elderfield, Jul 20 '16 at 09:03
I just tried grepl("BMW", data$body) which gives me just Boolean expressions. I would like to have the rows, containing "BMW" in data$body in a subset. Do you know how to do that? — Arthur Pennt, Jul 20 '16 at 09:07
Can i use these numerical positions of grep to make a subset out of the original dataframe? In the end i want to have a new dataset, where the body column contains "BMW", with all the other columns of the original dataset. — Arthur Pennt, Jul 20 '16 at 09:17
If there's a solution, please post it as an answer. It's better for other users and the site in general. — catastrophic-failure, Jul 20 '16 at 11:56
`grep` gives a probably shorter vector of numerical positions of matches. `grepl` give a vector of TRUE and FALSE of the same length as its 2nd argument. `grepl is very useful when doing selections with `[` or `[[`. — IRTFM, Jul 20 '16 at 16:45

score 2 · Answer 1 · answered Jul 20 '16 at 13:24

2

The solution is

bmw <- data[grep("BMW", data$body),]

Thanks to charleslmh

answered Jul 20 '16 at 13:24

Arthur Pennt

155
1
14

score 1 · Answer 2 · answered Jul 20 '16 at 17:02

Either of these would succeed:

bmw <- data[ grep("BMW", data$body), ]  # numerical indexing
bmw <- data[ grepl("BMW", data$body), ] # logical indexing

The second one will succeed because the "[" function selects rows where logical vectors are TRUE in the "i" (the first) position.

Detecting a word in a string in R

2 Answers2