9

Using PHP I want to check that a string contains only alphabetic characters (I do not want to allow any numerals or special characters like !@#$%^&*). ctype_alpha() would seem great for this purpose.

The problem is that I want to allow accented letters, such as found in French, etc. For example, I want to allow "Lórien".

I know that ctype_alpha() can be used with set_locale(), but that still seems too limited for this use case, since I want to allow characters from all latin-based languages.

Any ideas how best to accomplish this?


Note: The solution posted at How can I detect non-western characters? is great for explicitly detecting non-Latin characters, but it allows special characters and white space, which I do not want to allow:

preg_match('/[^\\p{Common}\\p{Latin}]/u', $string)

I want something that would work like this, but limit the allowed characters to alphabetic characters (so no special characters like !@#$%^&).

Community
  • 1
  • 1
Jordan Magnuson
  • 864
  • 3
  • 10
  • 21

2 Answers2

11

How about this regex:

^\p{Latin}+$

Working regex example:

https://regex101.com/r/I5b2mC/1

b4dc0de
  • 105
  • 6
Bryan Elliott
  • 4,055
  • 2
  • 21
  • 22
  • Is A-z already contained in \p{Latin}? Don't know, just wondering. –  Feb 07 '14 at 17:45
  • 1
    Beautiful. This is exactly what I needed. And indeed, `^[\p{Latin}]+$` seems to work just as well... I had no idea that the solution was so simple... – Jordan Magnuson Feb 07 '14 at 18:07
  • 1
    Also in my case it was essential to use the /u (unicode) flag to properly handle UTF-8 encoding. Otherwise it (sometimes) breaks inexplicably and it's really easy to forget about this flag. – Rav Jul 16 '17 at 20:56
3

This might work

 [^\P{latin}\s\p{Punctuation}]

Its all latin, but not punctuation nor whitespace.
where \P means NOT this property
and \p means this property.

Put it in a negative class its

NOT, NOT Latin = Include All Latin
NOT Punctuation = Exclude Punctuation
NOT Whitespace = Exclude Whitespace