removing variables containing certain string in r

Question

I'd have hundreds of observations and I'd like to remove the ones that contain the string "english basement". I can't seem to find the right syntax to do so. I can only figure out how to keep observations with the that string. For instance, I used the code below to get only observations containing the string, and it worked perfectly:

eng_base <- zdata %>%
filter(str_detect(zdata$ListingDescription, “english basement”))

Now I want a data set,top_10mpEB, that excludes observations containing "english basement". Your help is greatly appreciated.

Alternative dupes: https://stackoverflow.com/questions/6650510/remove-rows-from-data-frame-where-a-row-match-a-string/6650564 ;https://stackoverflow.com/questions/22249702/delete-rows-containing-specific-strings-in-r — Mike H., Feb 28 '18 at 21:44
@MikeH., thanks for noticing the duplication. I think the [first duplicate suggestion](https://stackoverflow.com/questions/6650510/remove-rows-from-data-frame-where-a-row-match-a-string/6650564) is note quite applicable here. I think is also related to the reason for which it seems that Abby_studies_fish got down-voted. — Valentin_Ștefan, Feb 28 '18 at 22:04

Valentin_Ștefan · Answer 1 · 2018-02-28T21:36:20.177

I do not know how your data looks like, but maybe this example helps you - I think you just need to negate the logical vector returned by str_detect:

library(dplyr)
library(stringr)
zdata <- data.frame(ListingDescription = c(rep("english basement, etc",3), letters[1:2] ))
zdata
#  ListingDescription
#1   english basement, etc
#2   english basement, etc
#3   english basement, etc
#4                  a
#5                  b
zdata %>%
  filter(!str_detect(ListingDescription, "english basement"))
#   ListingDescription
#1:                  a
#2:                  b

Or using data.table package (no need of stringr::str_detect):

library(data.table)
setDT(zdata)
zdata[! ListingDescription %like% "english basement"]
#   ListingDescription
#1:                  a
#2:                  b

Jorge · Answer 2 · 2018-02-28T21:16:56.183

You can do this using grepl():

x <- data.frame(ListingDescription = c('english basement other words description continued', 
                              'great fireplace and an english basement',
                              'no basement',
                              'a house with a sauna!',
                              'the pool is great... and wait till you see the english basement!',
                              'new listing...will go fast'),
            rent = c(3444, 23444, 346, 9000, 1250, 599))


x_english_basement <- x[grepl('english basement', 
x$ListingDescription)==FALSE, ]

score -1 · Answer 3 · answered Feb 28 '18 at 21:24

-1

You can use dplyr to easily filter your dataframe.

library(dplyr)
new_data <- data %>%
   filter(!ListingDescription=="english basement")

The ! became my best friend once I realized it meant "doesnt equal"

answered Feb 28 '18 at 21:24

Abby_studies_fish

23
5

`!=` does the same thing, but with one function – Rich Scriven Feb 28 '18 at 21:29
Note that `!=` is a bit more clear. However, here you catch exactly "english basement", but if you need to catch something like "english basement and something else" this will not work. – Valentin_Ștefan Feb 28 '18 at 21:31
Thanks Valentin, I realized afterwards that his character string is perhaps a bit more extensive then only "english basement" and your answer covers all scenarios. Rich, if you want to downvote me, you could explain yourself a bit more please thanks – Abby_studies_fish Feb 28 '18 at 21:37
@Abby_studies_fish, I agree that is not crystal clear how the data should look like and your proposed solution can be valid. – Valentin_Ștefan Feb 28 '18 at 21:53
Thank you all for your help. ! seems to have done the trick. – Holi Weaver Feb 28 '18 at 22:55

removing variables containing certain string in r

3 Answers3