0

Title says it all: I am checking to see if a user's username contains anything that isn't a number or letter, such as €{¥]^}+<€, punctuation, spaces or even things like âæłęč. Is this possible in php?

jchernin4
  • 11
  • 2
  • 7
  • You can do this, but why? Mr James O'Reilly, Ms Victoria Van de Waal, and Monsieur René Duchamps might all have something to say about it. –  May 26 '19 at 01:44
  • @Redd Herring I want to store the username as something that doesn't have special characters so you can't have things like invisible names with invisible characters and stuff. Basically I wanna just type 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ and check if every character is part of that list. – jchernin4 May 26 '19 at 02:07
  • Fair enough. I misread your question and interpreted username to mean user's name. Apologies! –  May 26 '19 at 02:52

2 Answers2

3

PHP does REGEX

What you want to do is fairly trivial, PHP has a number of regex functions

Testing a String For a Character

If all you want is to know IF a string contains non-alphanumeric characters, then just use preg_match():

preg_match( '/[^A-Za-z0-9]*/', $userName );

This will return 1 if the username contains anything other than alphanumeric (A-Z or a-z or 0to9), it returns 0 if it doesn't contain a non-alphanumeric.

Regex Pattern Elements

Regex PCRE patterns open and close with a delimiter such as a slash/, and that needs to be treated like a string (quoted):'/myPattern/' Some other key features are:

[ brackets contain match sets ]
[a-z] // means match any lowercase letter This pattern means check the current character in the $String relative to the pattern in these brackets, in this case match any lowercase letter a to z.

^ Caret (Meta-Character)
[^a-z] // means no lowercase letters If the caret ^ (aka hat) is the first character inside brackets, it NEGATES the pattern inside brackets so [^A7] means match anything EXCEPT uppercase A and the numeral 7. (Note: when outside brackets, the caret ^ means the start of the string.)

\w\W\d\D\s\S. Meta-Characters (WildCards)
\w // match all alphanumeric An escaped (i.e. preceded by a backslash \ ) lowercase w means match any "word" character, i.e. alphanumeric and the underscore _, this is shorthand for [A-Za-z0-9_]. The uppercase \W is the NOT word character, equivalent to [^A-Za-z0-9_] or [^\w]

.   // (dot) match ANY single character except return/newline
\w  // match any word character [A-Za-z0-9_]
\W  // NOT any word character [^A-Za-z0-9_]
\d  // match any digit [0-9]
\D  // NOT any digit [^0-9].
\s  // match any whitespace (tab, space, newline)
\S  // NOT any whitespace 

.*+?| Meta-Characters (Quantifiers))
These modify the behavior outside of a set []

*   // match previous character or [set] zero or more times, 
    // so .* means match everything (including nothing) until reaching a return/newline.
+   // match previous at least one or more times.
?   // match previous only zero or one time (i.e. optional).
|   // means logical OR eg.: com|net means match either literal "com" or "net"

Not shown: capture groups, backreferences, substitution (the real power of regex). See https://www.phpliveregex.com/#tab-preg-match for more including a live pattern-match playground that is based on the PHP functions, and delivers results as arrays.

Back To Your StringCleaning

So for your pattern, to match all non-letters and numbers (including underscores) you need either: '/[^A-Za-z0-9]*/' or '/[\W_]*/'

Strip Search

If instead you want to STRIP all the non-alpha characters from a string then use preg_replace( $Regex, $Replacement, $StringToClean )

<?php
    $username = 'Svéñ déGööfinøff';
    echo preg_replace('/[\W_]*/', '', $username);
?>

The output is: SvdGfinff If you'd prefer to replace certain accented letters with standard latin ones to keep the names reasonably readable, then I believe you'd need a lookup table (array). There is one ready to use at the PHP site

Myndex
  • 3,952
  • 1
  • 9
  • 24
2

You can use the ctype_alnum() function in PHP.

From the manual..

Check for alphanumeric character(s)
Returns TRUE if every character in text is either a letter or a digit, FALSE otherwise.

var_dump(ctype_alnum("æøsads")); // false
var_dump(ctype_alnum("123asd")); // true
Qirel
  • 25,449
  • 7
  • 45
  • 62
  • To it be safe for use, the variable must be a string. If the input provided is an integer, eg. 13, it will return false. I whould wrap the value on `strval()` to avoid this situation, unless the OP have sure that the value will always be string even with numeric values. – Elias Soares May 26 '19 at 01:53
  • "Note: If an integer between -128 and 255 inclusive is provided, it is interpreted as the ASCII value of a single character (negative values have 256 added in order to allow characters in the Extended ASCII range). Any other integer is interpreted as a string containing the decimal digits of the integer." – Elias Soares May 26 '19 at 01:54
  • @Qirel That looks like what im looking for! Thanks for your help :) – jchernin4 May 26 '19 at 02:10