What does this regex syntax actually mean in Java?

Question

I wrote a program to detect palindromes. It works with what I have, but I stumbled upon another bit of syntax, and I would like to know what it means exactly?

This is the line of code I'm using:

    userString = userString.toLowerCase().replaceAll("[^a-zA-Z]", "");

I understand that the replaceAll code snippet means to "match characters ([...]) that are not (^) in the range a-z and A-Z (a-zA-Z)."

However, this worked as well:

    replaceAll("[^(\p{L}')]", "");

I just don't understand how to translate that into English. I am completely new to regular expressions, and I find them quite fascinating. Thanks to anyone who can tell me what it means.

http://stackoverflow.com/a/14891168/3166303 – leeor Oct 11 '15 at 03:50 — leeor, Oct 11 '15 at 03:50

Alde · Accepted Answer · 2015-10-11T07:41:36.207

2

You should check this website: https://regex101.com

It helped me a lot when I was writing/testing/debugging some regexes ;)

It gives the following explanation:

[^(\p{L}')] match a single character not present in the list below:

( the literal character (
\p{L} matches any kind of letter from any language
') a single character in the list ') literally

edited Oct 11 '15 at 07:41

answered Oct 11 '15 at 03:55

Alde

146
6

Bohemian · Answer 2 · 2015-10-11T04:01:25.083

-1

The two regexes are not the same:

[^a-zA-Z] matches any char not an English letter
[^(\p{L}')] matches any char not a letter, quote or bracket

ie the 2nd one removes brackets and quotes too.

The regex \p{L} is the posix character class for "any letter". IE these two regexes are equivalent in the context of letters only from English:

[a-zA-Z]
\p{L}

edited Oct 11 '15 at 04:01

answered Oct 11 '15 at 03:55

Bohemian

412,405
93
575
722

4

`\p{L}` matches the [tag:Unicode] General Category `Letter` (it's not a [POSIX character class](http://www.regular-expressions.info/posixbrackets.html)) – Mariano Oct 11 '15 at 07:30

What does this regex syntax actually mean in Java?

2 Answers2