2

I am trying to set a validation rule for a field in my form that checks that the input only contains letters.

At first I tried to make a function that returned true if there were no numbers in the string, for that I used preg_match:

function my_format($str)
{
   return preg_match('/^([^0-9])$', $str);
}

It doesn't matter how many times I look at the php manual, it seems like I won't get to understand how to create the pattern I want. What's wrong with what I made?

But I'd like to extend the question: I want the input text to contain any letter but no numbers nor symbols, like question marks, exclamation marks, and all those you can imagine. BUT the letters I want are not only a-z, I want letters with all kinds of accents, as those used in Spanish, Portuguese, Swedish, Polish, Serbian, Islandic...

I guess this is no easy task and hard or impossible to do with preg_match. It there any library that covers my exact needs?

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
dabadaba
  • 9,064
  • 21
  • 85
  • 155
  • Input is utf-8 ? So you want to check, if the string only consists of white-spaces and any kind of letter from any kind of language, nothing else? – Jonny 5 Dec 25 '13 at 21:33

4 Answers4

6

If you're using utf-8 encoded input, go for unicode regex. Using the u modifier.

This one would match a string that only consists of letters and any kind of whitespace/invisible separators:

preg_match('~^[\p{L}\p{Z}]+$~u', $str);
Jonny 5
  • 12,171
  • 2
  • 25
  • 42
3

First of all,Merry Christmas.

You are on the right track with the first one, just missing a + to match one or more non-number characters:

preg_match('/^([^0-9]+)$/', $str);

As you can see, 0-9 is a range, from number 0 to 9. This applies to some other cases, like a-z or A-Z, the '-' is special and it indicates that it is a range. for 0-9, you can use shorthand of \d like:

preg_match('/^([^\d]+)$/', $str);

For symbols, if your list is punctuations . , " ' ? ! ; : # $ % & ( ) * + - / < > = @ [ ] \ ^ _ { } | ~, there is a shorthand.

preg_match('/^([^[:punct:]]+)$/', $str);

Combined you get:

preg_match('/^([^[:punct:]\d]+)$/', $str);
David Lin
  • 13,168
  • 5
  • 46
  • 46
2
function my_format($str)
{
   return preg_match('/^\p{L}+$/', $str);
}

Simpler than you think about!

\p{L} matches any kind of letter from any language

revo
  • 47,783
  • 14
  • 74
  • 117
1

Use the [:alpha:] POSIX expression.

function my_format($str) {
    return preg_match('/[[:alpha:]]+/u', $str);
}

The extra [] turns the POSIX into a range modified by the + to match 1 or more alphabetical characters. As you can see, the :alpha: POSIX matches accented characters as well

If you want to include whitespace, just add \s to the range:

preg_match('/[[:alpha:]\s]+/u', $str);

EDIT: Sorry, I misread your question when I looked over it a second time and thought you wanted punctuation. I've taken it back out.

Community
  • 1
  • 1
sjagr
  • 15,983
  • 5
  • 40
  • 67
  • 1
    Thanks. This was a good reminder to me to use [:alpha:] and not [a-zA-Z]. – Elliptical view Dec 25 '13 at 21:44
  • When you use `[[:alpha:]]` POSIX named class you should use `u` flag to match accented characters as well, which you didn't. – revo Dec 25 '13 at 21:54
  • @revo Do you have a source for this? Does the `u` follow the `/` after the regex? – sjagr Dec 25 '13 at 21:56
  • I'm so lazy to hunt it for you but you can test it somewhere online, and yes the flag will following `/` at the end. – revo Dec 25 '13 at 22:00
  • [Providing my own source](http://php.net/manual/en/reference.pcre.pattern.modifiers.php) for @revo's instructions. The `u` flag treats patterns strings as UTF-8 – sjagr Dec 25 '13 at 22:00
  • With that everything matches, I want to exclude numbers and symbols. – dabadaba Dec 26 '13 at 13:04