-2

I wrote a program to detect palindromes. It works with what I have, but I stumbled upon another bit of syntax, and I would like to know what it means exactly?

This is the line of code I'm using:

    userString = userString.toLowerCase().replaceAll("[^a-zA-Z]", "");

I understand that the replaceAll code snippet means to "match characters ([...]) that are not (^) in the range a-z and A-Z (a-zA-Z)."

However, this worked as well:

    replaceAll("[^(\p{L}')]", "");

I just don't understand how to translate that into English. I am completely new to regular expressions, and I find them quite fascinating. Thanks to anyone who can tell me what it means.

DevOpsSauce
  • 1,319
  • 1
  • 20
  • 52

2 Answers2

2

You should check this website: https://regex101.com

It helped me a lot when I was writing/testing/debugging some regexes ;)

It gives the following explanation:

[^(\p{L}')] match a single character not present in the list below:

  • ( the literal character (
  • \p{L} matches any kind of letter from any language
  • ') a single character in the list ') literally
Alde
  • 146
  • 6
-1

The two regexes are not the same:

  • [^a-zA-Z] matches any char not an English letter
  • [^(\p{L}')] matches any char not a letter, quote or bracket

ie the 2nd one removes brackets and quotes too.

The regex \p{L} is the posix character class for "any letter". IE these two regexes are equivalent in the context of letters only from English:

  • [a-zA-Z]
  • \p{L}
Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • 4
    `\p{L}` matches the [tag:Unicode] General Category `Letter` (it's not a [POSIX character class](http://www.regular-expressions.info/posixbrackets.html)) – Mariano Oct 11 '15 at 07:30