How to ignore whitespace in a regular expression subject string?

Question

Is there a simple way to ignore the white space in a target string when searching for matches using a regular expression pattern? For example, if my search is for "cats", I would want "c ats" or "ca ts" to match. I can't strip out the whitespace beforehand because I need to find the begin and end index of the match (including any whitespace) in order to highlight that match and any whitespace needs to be there for formatting purposes.

score 156 · Accepted Answer · edited May 02 '14 at 14:54

156

You can stick optional whitespace characters \s* in between every other character in your regex. Although granted, it will get a bit lengthy.

/cats/ -> /c\s*a\s*t\s*s/

edited May 02 '14 at 14:54

Chris

5,882
2
32
57

answered Jan 04 '11 at 03:06

Sam Dufel

17,560
3
48
51

Thanks, sounds like that's the way to go. But I just realized that I only want the optional whitespace characters if they follow a newline. So for example, "c\n ats" or "ca\n ts" should match. But wouldn't want "c ats" to match if there is no newline. Any ideas on how that might be done? – Steven Jan 04 '11 at 04:54
@Steven, see how I did it below, you can easily adapt my solution to such specific cases. – Bob May 25 '18 at 19:47
@chris I think, this regex is so strict for only cats, it can also be writing for any search of letters like this : `^([a-z]\s*)+$` – Sandeep Kaur Dec 10 '19 at 05:24

score 17 · Answer 2 · answered Dec 14 '18 at 10:29

While the accepted answer is technically correct, a more practical approach, if possible, is to just strip whitespace out of both the regular expression and the search string.

If you want to search for "my cats", instead of:

myString.match(/m\s*y\s*c\s*a\*st\s*s\s*/g)

Just do:

myString.replace(/\s*/g,"").match(/mycats/g)

Warning: You can't automate this on the regular expression by just replacing all spaces with empty strings because they may occur in a negation or otherwise make your regular expression invalid.

Aurimas · Answer 3 · 2014-05-12T20:40:27.330

10

Addressing Steven's comment to Sam Dufel's answer

Thanks, sounds like that's the way to go. But I just realized that I only want the optional whitespace characters if they follow a newline. So for example, "c\n ats" or "ca\n ts" should match. But wouldn't want "c ats" to match if there is no newline. Any ideas on how that might be done?

This should do the trick:

/c(?:\n\s*)?a(?:\n\s*)?t(?:\n\s*)?s/

See this page for all the different variations of 'cats' that this matches.

You can also solve this using conditionals, but they are not supported in the javascript flavor of regex.

edited May 12 '14 at 20:40

answered Mar 11 '12 at 00:43

Aurimas

311
2
10

6

So very ugly. There must be a better way. – james.garriss Jun 18 '15 at 12:13
You could make it more readable in JS syntax (though the technique would work in other languages) with: `new RegExp('cats'.split('').join('(?:\n\s*)?'))` – brianary Nov 01 '17 at 15:46
3

it's crazy that regex itself doesnt have something like a "tolerant search" or so. I'm dealing with RegEx for 15 years now, and it's still a mess :/ – Sliq Nov 12 '20 at 14:19

Kludge · Answer 4 · 2011-01-04T03:24:55.247

7

You could put \s* inbetween every character in your search string so if you were looking for cat you would use c\s*a\s*t\s*s\s*s

It's long but you could build the string dynamically of course.

You can see it working here: http://www.rubular.com/r/zzWwvppSpE

edited Jan 04 '11 at 03:24

answered Jan 04 '11 at 03:09

Kludge

161
6

score 4 · Answer 5 · answered Jan 04 '11 at 14:07

4

If you only want to allow spaces, then

\bc *a *t *s\b

should do it. To also allow tabs, use

\bc[ \t]*a[ \t]*t[ \t]*s\b

Remove the \b anchors if you also want to find cats within words like bobcats or catsup.

answered Jan 04 '11 at 14:07

Tim Pietzcker

328,213
58
503
561

Bob · Answer 6 · 2018-05-25T18:02:41.207

This approach can be used to automate this (the following exemplary solution is in python, although obviously it can be ported to any language):

you can strip the whitespace beforehand AND save the positions of non-whitespace characters so you can use them later to find out the matched string boundary positions in the original string like the following:

def regex_search_ignore_space(regex, string):
    no_spaces = ''
    char_positions = []

    for pos, char in enumerate(string):
        if re.match(r'\S', char):  # upper \S matches non-whitespace chars
            no_spaces += char
            char_positions.append(pos)

    match = re.search(regex, no_spaces)
    if not match:
        return match

    # match.start() and match.end() are indices of start and end
    # of the found string in the spaceless string
    # (as we have searched in it).
    start = char_positions[match.start()]  # in the original string
    end = char_positions[match.end()]  # in the original string
    matched_string = string[start:end]  # see

    # the match WITH spaces is returned.
    return matched_string

with_spaces = 'a li on and a cat'
print(regex_search_ignore_space('lion', with_spaces))
# prints 'li on'

If you want to go further you can construct the match object and return it instead, so the use of this helper will be more handy.

And the performance of this function can of course also be optimized, this example is just to show the path to a solution.

This is the only one that properly worked for me. – ZhouW Jan 06 '21 at 06:27 — ZhouW, Jan 06 '21 at 06:27

score 0 · Answer 7 · edited Jun 30 '22 at 14:19

The accepted answer will not work if and when you are passing a dynamic value (such as "current value" in an array loop) as the regex test value. You would not be able to input the optional white spaces without getting some really ugly regex. Konrad Hoffner's solution is therefore better in such cases as it will strip both the regest and test string of whitespace. The test will be conducted as though both have no whitespace.

How to ignore whitespace in a regular expression subject string?

7 Answers7

Linked

Related