0

I want to disallow all symbols in a string, and instead of going and disallowing each one I thought it'd be easier to just allow alphanumeric characters (a-z A-Z 0-9).

How would I go about parsing a string and converting it to one which only has allowed characters? I also want to convert any spaces into _.

At the moment I have:

function parseFilename($name) {
    $allowed = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
    $name = str_replace(' ', '_', $name);

    return $name;
}

Thanks

James Dawson
  • 5,309
  • 20
  • 72
  • 126
  • possible duplicate of [PHP regex, replace all trash symbols with underscores](http://stackoverflow.com/questions/6089503/php-regex-replace-all-trash-symbols-with-underscores) – mario May 16 '12 at 20:09
  • possible duplicate of [Regex to strip out everything but words and numbers (and latin chars)](http://stackoverflow.com/questions/6982915/regex-to-strip-out-everything-but-words-and-numbers-and-latin-chars) – mario May 16 '12 at 20:10
  • or [PHP: the best way to remove punctuation marks, symbols, diacritics, special characters, etc](http://stackoverflow.com/questions/4762546/php-the-best-way-to-remove-punctuation-marks-symbols-diacritics-special-char) – mario May 16 '12 at 20:11

5 Answers5

2

Try

$name = preg_replace("/[^a-zA-Z0-9]/", "", $name);
Ansari
  • 8,168
  • 2
  • 23
  • 34
1

You could do both replacements at once by using arrays as the find / replace params in preg_match():

$str = 'abc def+ghi&jkl   ...z';
$find = array( '#[\s]+#','#[^\w]+#' );
$replace = array( '_','' );
$newstr = preg_replace( $find,$replace,$str );
print $newstr;

// outputs:
// abc_defghijkl_z

\s matches whitespace (replaced with a single underscore), and as @F.J described, ^\w is anything "not a word character" (replaced with empty string).

traq
  • 281
  • 2
  • 11
0

preg_replace() is the way to go here, the following should do what you want:

function parseFilename($name) {
    $name = str_replace(' ', '_', $name);
    $name = preg_replace('/[^\w]+/', '', $name);
    return $name;
}

[^\w] is equivalent to [^a-zA-Z0-9_], which will match any character that is not alphanumeric or an underscore. The + after it means match one or more, this should be slightly more efficient than replacing each character individually.

Andrew Clark
  • 202,379
  • 35
  • 273
  • 306
0

The replacement if spaces with spaces does not require the might of the regex engine; it can wait out the first round of replacements.

The purging of all non-alphanumeric characters and underscores is concisely handled by \W -- it means any character not in a-z, A-Z, 0-9, or _.

Code: (Demo)

function sanitizeFilename(string $name): string {
    return preg_replace(
               '/\W+/',
               '',
               str_replace(' ', '_', $name)
           );
}

echo sanitizeFilename('This/is My     1! FilenAm3');

Output:

Thisis_My_____1_FilenAm3

...but if you want to condense consecutive spaces and replace them with a single underscore, then use regex. (Demo)

function sanitizeFilename(string $name): string {
    return preg_replace(
               ['/ +/', '/\W+/'],
               ['_', ''],
               $name
           );
}

echo sanitizeFilename('This/has a      Gap !n 1t');

Output:

Thishas_a_Gap_n_1t
mickmackusa
  • 43,625
  • 12
  • 83
  • 136
-1

Try working with the HTML part

pattern="[A-Za-z]{8}" title="Eight letter country code">