2

I'm trying to write a regex which will capture two or more whitespaces excluding leading whitespaces. Let's take the bellow example

One OS to rule them     all,
    One  OS  to  find    them.
    One     OS to call them    all,
    And  in  salvation    bind         them.
    In  the  bright  land  of  Linux,
    Where the     hackers play.

I want it to become

One OS to rule them all,
    One OS to find them.
    One OS to call them all,
    And in salvation bind them.
    In the bright land of Linux,
    Where the hackers play.

By using this regex ([ ]* ){2,} I can capture two or more whitespaces. The problem with this is that it also captures the leading whitespaces on lines 2 - 5.

Note: I want to use this regex inside Intellij IDEA.

mackatozis
  • 252
  • 1
  • 5
  • 15

3 Answers3

3

You can use a regex like this:

\b\s+\b

With a space _ substitution

Working demo

enter image description here

Update for IntelliJ: seems the lookarounds aren't working on IntelliJ, so you can try this other workaround:

(\w+ )\s+

With replacement string: $1

Working demo

Of course, above regex will narrow the scenarios but you can try with that.

Federico Piazza
  • 30,085
  • 15
  • 87
  • 123
2

In your example, you could use the word-boundary meta-character :

\b\s{2,}

That will match any number of spaces greater than 2 that follow the end of a word (or the beginning, but a word can't start with spaces).

However, it would fail in a more general case where you could have multiples spaces following a special character, which won't be considered part of a word.

If your language supports unbounded-width lookbehind, you can match the following :

(?<!^\s*)\s{2,}
Aaron
  • 24,009
  • 2
  • 33
  • 57
  • 1
    What regex engine do you use to use quantifiers on lookbehind? I though none supports that – Federico Piazza Jul 05 '16 at 15:00
  • 1
    probably it should be `{2,}` – Dmitry Bychenko Jul 05 '16 at 15:02
  • @Aaron I'm not trying to bug your answer, I always wanted to use quantifiers on lookbehind even on PCRE and never work. Have you tried your regex? Using your 2nd expression you can see it is not compiling https://regex101.com/r/tJ8zO3/2 – Federico Piazza Jul 05 '16 at 15:07
  • infinite lookbehinds are not supported on all PCRE regex engine..`.NET` is one which supports it – rock321987 Jul 05 '16 at 15:08
  • `regex` module of python supports infinite lookbehind..[source](http://stackoverflow.com/a/24987519/1996394) – rock321987 Jul 05 '16 at 15:15
  • @FedericoPiazza ok, I can't read... "each alternative still has to be fixed-length". It's variable length sure, but not unbounded length. Hopefully rock321987 was here to answer your question, so at least `python`'s `regex` module and `.NET` support unbounded-width lookbehinds. – Aaron Jul 05 '16 at 15:15
2

With a support for (*SKIP)(*FAIL) you could also come up with:

^[ ]+(*SKIP)(*FAIL)  # match spaces at the beginning of a line
                     # these shall fail
|                    # OR
[ ]{2,}              # at least two spaces

See a demo on regex101.com (mind the modifiers!).

Jan
  • 42,290
  • 8
  • 54
  • 79