Remove all rows which do not contain a specific string in R

Question

I'm still learning many things about how to use R, however I'm facing an issue which I haven't been able to find any answers for yet.

In my dataframe ("data"), the rows are for each participant and for each participants' trials on a given task. The columns contain different information about these participants. It looks a little bit like this:

Participant    Age     Sex    Trial.Type       correct
     P01       26       0       test              1
     P01       26       0       test              0
     P01       26       0       control           1
     P02       32       1       test              1
     P02       32       1       control           1
     P02       32       1       demographics      NA

I would like to create a new dataframe df. In this dataframe, I would like to remove all the rows that do NOT contain the string "test" in the data$Trial.Type column.

I have seen that in order to remove all the rows that contain a specific string, I could use the following function:

df <- data[-grep("test", data$Trial.Type),]

Which works great to remove all rows that contain the "test" string, but actually I would like to do the opposite, and remove all the rows except those with the "test" string (and in a more efficient way than running the function above for each non "test" strings).

I hope I was clear enough and I followed the rules, it's my first post on StackOverflow

In case we want exact match, just use: `dataSubset <- data[ data$Trial.Type == "test", ]` — zx8754, Jan 30 '18 at 12:07

score 5 · Accepted Answer · answered Jan 30 '18 at 12:04

df <- data[grep("test", data$Trial.Type),]

Explanation

grep returns the indices of every match of the pattern, in your case "test". When you use the negatived indices you effectively exclude the matches (see In R, what does a negative index do?), and using them as they come (i.e. positive indices) is the same as only returning the matches, excluding everything else.

Remove all rows which do not contain a specific string in R

1 Answers1

Explanation