Only allowing certain characters in a string

Question

I want to disallow all symbols in a string, and instead of going and disallowing each one I thought it'd be easier to just allow alphanumeric characters (a-z A-Z 0-9).

How would I go about parsing a string and converting it to one which only has allowed characters? I also want to convert any spaces into _.

At the moment I have:

function parseFilename($name) {
    $allowed = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
    $name = str_replace(' ', '_', $name);

    return $name;
}

Thanks

possible duplicate of [PHP regex, replace all trash symbols with underscores](http://stackoverflow.com/questions/6089503/php-regex-replace-all-trash-symbols-with-underscores) — mario, May 16 '12 at 20:09
possible duplicate of [Regex to strip out everything but words and numbers (and latin chars)](http://stackoverflow.com/questions/6982915/regex-to-strip-out-everything-but-words-and-numbers-and-latin-chars) — mario, May 16 '12 at 20:10
or [PHP: the best way to remove punctuation marks, symbols, diacritics, special characters, etc](http://stackoverflow.com/questions/4762546/php-the-best-way-to-remove-punctuation-marks-symbols-diacritics-special-char) — mario, May 16 '12 at 20:11

score 2 · Answer 1 · answered May 16 '12 at 20:09

2

Try

$name = preg_replace("/[^a-zA-Z0-9]/", "", $name);

answered May 16 '12 at 20:09

Ansari

8,168
2
23
34

score 1 · Answer 2 · answered May 16 '12 at 23:20

You could do both replacements at once by using arrays as the find / replace params in preg_match():

$str = 'abc def+ghi&jkl   ...z';
$find = array( '#[\s]+#','#[^\w]+#' );
$replace = array( '_','' );
$newstr = preg_replace( $find,$replace,$str );
print $newstr;

// outputs:
// abc_defghijkl_z

\s matches whitespace (replaced with a single underscore), and as @F.J described, ^\w is anything "not a word character" (replaced with empty string).

score 0 · Answer 3 · answered May 16 '12 at 20:13

preg_replace() is the way to go here, the following should do what you want:

function parseFilename($name) {
    $name = str_replace(' ', '_', $name);
    $name = preg_replace('/[^\w]+/', '', $name);
    return $name;
}

[^\w] is equivalent to [^a-zA-Z0-9_], which will match any character that is not alphanumeric or an underscore. The + after it means match one or more, this should be slightly more efficient than replacing each character individually.

score 0 · Answer 4 · answered Mar 13 '22 at 05:56

The replacement if spaces with spaces does not require the might of the regex engine; it can wait out the first round of replacements.

The purging of all non-alphanumeric characters and underscores is concisely handled by \W -- it means any character not in a-z, A-Z, 0-9, or _.

Code: (Demo)

function sanitizeFilename(string $name): string {
    return preg_replace(
               '/\W+/',
               '',
               str_replace(' ', '_', $name)
           );
}

echo sanitizeFilename('This/is My     1! FilenAm3');

Output:

Thisis_My_____1_FilenAm3

...but if you want to condense consecutive spaces and replace them with a single underscore, then use regex. (Demo)

function sanitizeFilename(string $name): string {
    return preg_replace(
               ['/ +/', '/\W+/'],
               ['_', ''],
               $name
           );
}

echo sanitizeFilename('This/has a      Gap !n 1t');

Output:

Thishas_a_Gap_n_1t

Leontin Groza · Answer 5 · 2019-02-15T14:07:50.573

-1

Try working with the HTML part

pattern="[A-Za-z]{8}" title="Eight letter country code">

edited Feb 15 '19 at 14:07

answered Feb 15 '19 at 14:04

Leontin Groza

9
2

Try looking into this – Leontin Groza Feb 15 '19 at 14:05

Only allowing certain characters in a string

5 Answers5

Related