48

Can I search a character list for a string where I don't know how the string is cased? Or more generally, I'm trying to reference a column in a dataframe, but I don't know exactly how the columns are cased. My thought was to search names(myDataFrame) in a case-insensitive manner to return the proper casing of the column.

Michael Ohlrogge
  • 10,559
  • 5
  • 48
  • 76
Suraj
  • 35,905
  • 47
  • 139
  • 250

7 Answers7

71

I would suggest the grep() function and some of its additional arguments that make it a pleasure to use.

grep("stringofinterest",names(dataframeofinterest),ignore.case=TRUE,value=TRUE)

without the argument value=TRUE you will only get a vector of index positions where the match occurred.

Sam Firke
  • 21,571
  • 9
  • 87
  • 105
Farrel
  • 10,244
  • 19
  • 61
  • 99
41

Assuming that there are no variable names which differ only in case, you can search your all-lowercase variable name in tolower(names(myDataFrame)):

match("b", tolower(c("A","B","C")))
[1] 2

This will produce only exact matches, but that is probably desirable in this case.

Sam Firke
  • 21,571
  • 9
  • 87
  • 105
Aniko
  • 18,516
  • 4
  • 48
  • 45
17

With the stringr package, you can modify the pattern with one of the built in modifier functions (see `?modifiers). For example since we are matching a fixed string (no special regular expression characters) but want to ignore case, we can do

str_detect(colnames(iris), fixed("species", ignore_case=TRUE))

Or you can use the (?i) case insensitive modifier

str_detect(colnames(iris), "(?i)species")
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • 2
    Also all modifiers from `?stringr::modifiers` have `ignore.case` as the 2nd argument so here for example you can type `str_detect(colnames(iris), fixed("species",ignore_case=TRUE))` – moodymudskipper Jun 14 '18 at 23:57
  • I looked into `stringr`' s doc and didn't find this behavior documented, where did you get this from ? – moodymudskipper Jun 15 '18 at 15:55
  • These are fairly standard regular expression modifiers: https://www.regular-expressions.info/modifiers.html – MrFlick Jun 15 '18 at 15:58
  • yes but `grepl` doesn't seem to support them so I assumed it was coded into `stringr` or `stringi` – moodymudskipper Jun 15 '18 at 16:19
  • 5
    `grepl` does with `perl=TRUE`: `grepl("(?i)species", colnames(iris), perl=TRUE)` – MrFlick Jun 15 '18 at 16:20
  • My bad, actually `grepl("(?i)species",colnames(iris))` works, but `grepl("(?x)S pecies",colnames(iris))` doesn't while `grepl("(?x)S pecies",colnames(iris), perl=TRUE)` does. – moodymudskipper Jun 15 '18 at 16:26
  • This lead me to post this bounty : https://stackoverflow.com/questions/47240375/regular-expressions-in-base-r-perl-true-vs-the-default-pcre-vs-tre – moodymudskipper Jun 15 '18 at 17:08
6

For anyone using this with %in%, simply use tolower on the right (or both) sides, like so:

"b" %in% c("a", "B", "c")
# [1] FALSE

tolower("b") %in% tolower(c("a", "B", "c"))
# [1] TRUE
stevec
  • 41,291
  • 27
  • 223
  • 311
1

The searchable package was created for allowing for various types of searching within objects:

l <- list( a=1, b=2, c=3 )
sl <- searchable(l)        # make the list "searchable"
sl <- ignore.case(sl)      # turn on case insensitivity

> sl['B']
$b
[1] 2

It works with lists and vectors and does a lot more than simple case-insensitive matching.

ctbrown
  • 2,271
  • 17
  • 24
0

If you want to search for one set of strings in another set of strings, case insensitively, you could try:

s1 = c("a", "b")
s2 = c("B", "C")
matches = s1[ toupper(s1) %in% toupper(s2) ]
RickN
  • 1
-1

Another way of achieving this is to use str_which(string, pattern) from the stringr package:

library("stringr")
str_which(string = tolower(colnames(iris)), pattern = "species")
Tino
  • 2,091
  • 13
  • 15