-1

I want to check if a string contains only characters, numbers and special-chars common in Europe. I found answers like How to check, if a php string contains only english letters and digits?, but this is not covering French é and è or German äöüß or Romanian ă. I also want to allow often use special-chars like €, !"§$%&/()=#|<>

Does somebody have a complete set which contains all those chars to make a check out of it?

Werner
  • 1,695
  • 3
  • 21
  • 42
  • What are you asking, what other characters to add to your list of “often use[d] special-chars”? Well that would be up to you, wouldn’t it? We can not know what you want to restrict and _why_. – misorude Feb 12 '20 at 11:44
  • Right. It's an online shop offering ordering for whole EU. They are often getting messages in online-forms in Cyrillic or Chinese which are hard to translate which should get a message like: Please use another language. But for example the French name "René" should be allowed, like € or % or ! which might occure in a valid request. So perhaps other people might have such a list of characters to put in such a list to check. – Werner Feb 12 '20 at 11:50
  • Do you need to allow all the characters you referred to including English? – unclexo Feb 12 '20 at 12:10

3 Answers3

2

You can test for Latin characters with \p{Latin} making sure to use the u regex flag:

<?php
$tests = [
    'éèäöüßäöüßäöüßäöü',
    'abcdeABCDE',
    '€, !"§$%&/()=#|<>',
    'ÄäAa',
    '*',
    'Здравствуйте'
];

foreach ($tests as $test) {
    if (!preg_match('/[^\p{Latin}0-9€, !"§$%&\/()=#|<>]/u', $test)) {
        echo "$test is okay\n";
    }
}

Prints:

éèäöüßäöüßäöüßäöü is okay
abcdeABCDE is okay
€, !"§$%&/()=#|<> is okay
ÄäAa is okay
Booboo
  • 38,656
  • 3
  • 37
  • 60
0

I think you can use a regex

$re = '/[A-Za-z0-9]*/m';
$str = 'человек';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
-1

Characters not in a-z & A-Z would be:

[^a-zA-Z]

So you may use something like:

Regex_CountMatches([String_Field],"[^a-zA-Z]")

Because this function has a case option (default value of 1 is case insensitive), just searching for [^a-z] may work too.

i am batman
  • 581
  • 3
  • 6
  • 21
  • Thanks for your reply. The problem is, that a-zA_z doesn't contain chars like éèäöü which should be valid too. My problem is not to get this into a regexp but to get a list "as complete as possible" of chars that might me valid. – Werner Feb 12 '20 at 11:56
  • This is matching any character that is not latin letter. – Toto Feb 12 '20 at 13:47