Get values which contain only whitelisted characters from a comma-delimited string

Question

I have an array (converted from a string) that contains words with non-standard letters (letters not used in English, like ć, ä, ü). I don't want to replace those characters, I want to get rid of the whole words that have them.

from [Adam-Smith, Christine, Müller, Roger, Hauptstraße, X Æ A-12]
to   [Adam-Smith, Christine, Roger]

This is what I got so far:

<?php 
    $tags = "Adam-Smith, Christine, Müller, Roger, Hauptstraße, X Æ A-12";

    $tags_array = preg_split("/\,/", $tags); 

    $tags_array = array_filter($tags_array, function($value){
       return strstr($value, "a") === false;
    });

    foreach($tags_array as $tag) {
        echo "<p>".$tag."</p>";
    }
?>

I have no idea how to delete words that are not [a-z, A-Z, 0-9] and [(), "", -, +, &, %, @, #] characters. Right now the code deletes every word with an "a". What should I do to achieve this?

What is your meaning when you say `""` between `()` and `-`? I don't know how you want to exclude an empty space (zero-width character). — mickmackusa, Oct 10 '22 at 02:00

bobi · Answer 1 · 2022-10-07T11:47:25.800

2

This should do the work for you

https://onlinephp.io/c/dd46c

$tags = ['Adam-Smith', 'Christine', 'Müller', 'Roger', 'Hauptstraße', 'X Æ A-12'];
$output = [];
            
foreach($tags as $word){
    if (!preg_match('/[^A-Z\-a-z!@#$%\^&\*\(\)\+\-\"]/', $word)) {
                    $output[] = $word;
    }
}
            
print_r($output);

output

Array(
[0] => Adam-Smith
[1] => Christine
[2] => Roger
)

edited Oct 07 '22 at 11:47

answered Oct 07 '22 at 11:34

bobi

169
2
16

Thank you, but it works only partially. It doesn't take into account (), +, &, %, @, # and numbers – Astw41 Oct 07 '22 at 11:39

Jacob Mulquin · Accepted Answer · 2022-10-07T11:50:53.903

$raw = 'Adam-Smith, Christine, Müller, Roger, Hauptstraße, X Æ A-12, johnny@knoxville, some(person), thing+asdf, Jude "The Law" Law, discord#124123, 100% A real person, shouldntadd.com';

$regex = '/[^A-Za-z0-9\s\-\(\)\"\+\&\%\@\#]/';

$tags = array_map('trim', explode(',', $raw));

$tags = array_filter($tags, function ($tag) use ($regex) {
    return !preg_match($regex, $tag);
});

var_dump($tags);

Yields:

array(9) {
    [0]=>
    string(10) "Adam-Smith"
    [1]=>
    string(9) "Christine"
    [2]=>
    string(5) "Roger"
    [3]=>
    string(16) "johnny@knoxville"
    [4]=>
    string(12) "some(person)"
    [5]=>
    string(10) "thing+asdf"
    [6]=>
    string(18) "Jude "The Law" Law"
    [7]=>
    string(14) "discord#124123"
    [8]=>
    string(18) "100% A real person"
  }

If you want to include a full stop as an allowable character (if you were checking for email addresses), you can add \. to the end of the regex.

This \answe\r is usin\g e\xcessi\ve an\d u\nnec\essary s\lashes \in t\he \p\atte\rn. — mickmackusa, Oct 10 '22 at 02:02

score 0 · Answer 3 · answered Oct 10 '22 at 01:57

This task can be completed more directly/efficiently than the earlier answers demonstrate. Just split on commas which may have leading or trailing spaces AND treat any names with non-whitelisted characters as delimiters too.

The result array will only contain the qualifying names and they will be whitespace trimmed without making any extra calls.

/ *, *|[^,]*[^, a-z\d()\-+&%@#][^,]*/i
#                                    ^- case-insensitive pattern
#      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^--- match names containing at least one non-whitelisted character
#     ^-------------------------------- OR
#^^^^^--------------------------------- optional leading spaces or trailing spaces around a comma

Code: (Demo)

var_export(
    preg_split(
        '/ *, *|[^,]*[^, a-z\d()\-+&%@#][^,]*/i',
        $tags,
        0,
        PREG_SPLIT_NO_EMPTY
    )
);

Get values which contain only whitelisted characters from a comma-delimited string

3 Answers3