14

I've this regex:

if (cadena.matches("^[a-zA-Z ]+$")) return true;

It's accepting from A to Z as lowercase and uppercase. Also accepting spaces.

But this is working just for english. For instance, in Catalan we've the 'ç' character. Also we've characters with 'á', or 'à', etc.

Did some google and I couldn't find any way to do this.

I found out that I can filter for UTF-8 but this would accept characters that are not really a letter.

How can I implement this?

logi-kal
  • 7,107
  • 6
  • 31
  • 43
Reinherd
  • 5,476
  • 7
  • 51
  • 88
  • Take a look at [Unicode blocks](http://jregex.sourceforge.net/gstarted.html#appendix-c). – Linus Kleen Jun 07 '13 at 09:38
  • Dunno if this helps: http://stackoverflow.com/questions/896374/what-is-the-regular-expression-for-a-spanish-word more relevant: http://stackoverflow.com/questions/6548815/how-do-i-match-latin-unicode-characters-in-coldfusion-or-java-regex?rq=1 – wazy Jun 07 '13 at 09:40
  • Also have a look at [stackoverflow.com/questions/9499851/...](http://stackoverflow.com/questions/9499851/regex-for-validating-alphabetics-and-numbers-in-the-localized-string/9500409#9500409) – stema Jun 07 '13 at 10:27

2 Answers2

26

Use this regex:

[\p{L}\s]+

\p{L} means any Unicode letter.

fiddle.re Demo.

mvp
  • 111,019
  • 13
  • 122
  • 148
  • 5
    Doesn't this match non-Latin characters as well, which is not exactly what the OP was looking for (even though they did accept this answer)? It matches `안녕`, for example. It seems like `\p{IsLatin}` is a better fit if you specifically want to match Latin characters ([ref](http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html)). – Nick Chammas Nov 20 '16 at 03:05
  • @NickChammas: op explicitly wanted any Unicode letters matched. – mvp Nov 20 '16 at 03:31
  • At this point 3 years after the fact I suppose it's a moot point but the OP's title and intended use case of matching the Catalan alphabet suggest they want to just match Latin characters and not all Unicode (which would include other alphabets, like Korean). I don't see where the OP explicitly wanted to match "any Unicode letter". But anyway, I upvoted this answer because it was helpful. I hope my earlier comment helps others who, like me, came to this page looking for a way to match just Latin characters and not any Unicode. – Nick Chammas Nov 20 '16 at 16:43
  • 3
    To elaborate a bit, in case this is a point of confusion, Latin != ASCII. Most Latin characters, like `ë`, `ɶ`, or `ṧ`, can only be [represented by Unicode](https://en.wikipedia.org/wiki/Latin_script_in_Unicode). `\p{IsLatin}` will match those characters without matching characters from other, non-Latin alphabets. – Nick Chammas Nov 20 '16 at 17:42
  • best solution – yildirimosman Apr 12 '20 at 22:56
-2

Look at the documentation and use a class (e.g. \p{InLATIN_1_SUPPLEMENT}).

Uwe Plonus
  • 9,803
  • 4
  • 41
  • 48
  • 2
    This documentation page does NOT have `Latin1Supplemental` mentioned anywhere. Even googling for `Latin1Supplemental` at `site:oracle.com` does not find it. What gives? – mvp Jun 07 '13 at 09:52
  • It could be named something different. Please check the documentation for `Character.UnicodeBlock`. There is a constant named `LATIN_!_SUPPLEMENTAL` which name could be used for the `\p{}` name. – Uwe Plonus Jun 07 '13 at 09:56
  • This should be: `Pattern.compile("\\p{InLATIN_1_SUPPLEMENT}")`. Mind the `In` preceding the `Character.UnicodeBlock` constant. From "Mastering Regular Expressions": "Unicode blocks are supported, requiring an ‘In’ prefix." – Stefan van den Akker Mar 28 '19 at 10:30
  • @StefanvandenAkker You are right. I corrected my answer. – Uwe Plonus Mar 28 '19 at 18:40
  • 1
    @UwePlonus Sorry, that still doesn't compile. It should be either `\p{InLATIN_1_SUPPLEMENT}`, `\p{InLATIN-1 SUPPLEMENT}` or `\p{InLATIN-1SUPPLEMENT}` as per the `idName` and `aliases` taken from `Character.UnicodeBlock.LATIN_1_SUPPLEMENT`. – Stefan van den Akker Mar 29 '19 at 15:37