-1

So, I was messing around with regex to convert sentences into pig latin, and decided to extend the assignment to allow for punctuation. I was looking for a regex that allowed for me to replace the punctuation with an empty string and found myString.replaceAll("\\p{P}", ""); and was curious as to what the \p and {P} actually do here. Other similar questions have used "\\p{Z}" to replace whitespace, which leads me to think the \p is searching for whatever is inside of the brackets.

Anyways any clarifications or directions to documentation would be much appreciated.

Christopher Oezbek
  • 23,994
  • 6
  • 61
  • 85
user8642594
  • 58
  • 3
  • 9
  • 2
    The javadoc explains this: https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html – Stephen C May 12 '18 at 01:36
  • 1
    don't know how I missed this. Thank you – user8642594 May 12 '18 at 01:50
  • If I search the Javadocs I don't find `\p{P}` just `\p{Punctuation}` and `\p{Print}`. Similarly I searched the referenced answer which is already supposed to answer this question and also don't find a reference to `\p{P}` – Christopher Oezbek Jan 11 '22 at 15:45
  • The linked question ("[What does this regex mean?](//stackoverflow.com/q/22937618/90527)") doesn't specifically mention the punctuation Unicode category, but does state that `\p` is for Unicode categories, which addresses what `\p` means (which is asked in this question). Whether it's a duplicate depends in part on whether this question is primarily asking for an explanation for `\p`, or what the 'P' category is (in which case, this question is actually asking 2 questions, one of which is a duplicate). – outis Feb 02 '22 at 10:05

1 Answers1

4

In PCRE regular expressions

  • \p{P} is "Any punctuation character"
  • \p{Z} is "Any whitespace character"

See the "EXPLANATION" section on the right: https://regex101.com/r/ZFIKpv/1

Shamus
  • 96
  • 4