How to grep for all-but-one matching columns in R

Question

I am trying to subset a large data frame with my columns of interest. I do so using the grep function, this selects one column too many ("has_socio"), which I would like to remove.

The following code does exactly what I want, but I find it unpleasant to look at. I want to do it in one line. Aside from just calling the first subset inside the second subset, can it be optimized?

DF <- read.dta("./big.dta")

DF0 <- na.omit(subset(DF, select=c(other_named_vars, grep("has_",names(DF)))))
DF0 <- na.omit(subset(DF0, select=-c(has_socio)))

I know similar questions have been asked (e.g. Subsetting a dataframe in R by multiple conditions) but I do not find one that addresses this issue specifically. I recognize I could just write the grep RE more carefully, but I feel the above code more clearly expresses my intent.

Thanks.

BrodieG · Accepted Answer · 2014-02-15T23:59:17.843

4

Replace your grep with:

vec <- c("blah", "has_bacon", "has_ham", "has_socio")
grep("^has_(?!socio$)", vec, value=T, perl=T)
# [1] "has_bacon" "has_ham"

(?!...) is a negative lookahead operator, which looks ahead and makes sure that its contents do not follow the actual matching piece behind of it (has_ being the matching piece).

edited Feb 15 '14 at 23:59

answered Feb 15 '14 at 23:47

BrodieG

51,669
9
93
146

It seems the correct way is indeed to make my RE more specific. I used standard (not perl) RE syntax, which is a bit shorter: `grep("has_[^s]")`. – rjturn Feb 20 '14 at 03:34

score 1 · Answer 2 · answered Feb 15 '14 at 23:49

1

setdiff(grep("has_", vec, value = TRUE), "has_socio")
## [1] "has_bacon" "has_ham"

answered Feb 15 '14 at 23:49

Jake Burkhead

6,435
2
21
32

How to grep for all-but-one matching columns in R

2 Answers2