1

I have an array (converted from a string) that contains words with non-standard letters (letters not used in English, like ć, ä, ü). I don't want to replace those characters, I want to get rid of the whole words that have them.

from [Adam-Smith, Christine, Müller, Roger, Hauptstraße, X Æ A-12]
to   [Adam-Smith, Christine, Roger]

This is what I got so far:

<?php 
    $tags = "Adam-Smith, Christine, Müller, Roger, Hauptstraße, X Æ A-12";

    $tags_array = preg_split("/\,/", $tags); 

    $tags_array = array_filter($tags_array, function($value){
       return strstr($value, "a") === false;
    });

    foreach($tags_array as $tag) {
        echo "<p>".$tag."</p>";
    }
?> 

I have no idea how to delete words that are not [a-z, A-Z, 0-9] and [(), "", -, +, &, %, @, #] characters. Right now the code deletes every word with an "a". What should I do to achieve this?

mickmackusa
  • 43,625
  • 12
  • 83
  • 136
Astw41
  • 394
  • 3
  • 12
  • What is your meaning when you say `""` between `()` and `-`? I don't know how you want to exclude an empty space (zero-width character). – mickmackusa Oct 10 '22 at 02:00

3 Answers3

2

This should do the work for you

https://onlinephp.io/c/dd46c

$tags = ['Adam-Smith', 'Christine', 'Müller', 'Roger', 'Hauptstraße', 'X Æ A-12'];
$output = [];
            
foreach($tags as $word){
    if (!preg_match('/[^A-Z\-a-z!@#$%\^&\*\(\)\+\-\"]/', $word)) {
                    $output[] = $word;
    }
}
            
print_r($output);

output

Array(
[0] => Adam-Smith
[1] => Christine
[2] => Roger
)
bobi
  • 169
  • 2
  • 16
  • Thank you, but it works only partially. It doesn't take into account (), +, &, %, @, # and numbers – Astw41 Oct 07 '22 at 11:39
1
$raw = 'Adam-Smith, Christine, Müller, Roger, Hauptstraße, X Æ A-12, johnny@knoxville, some(person), thing+asdf, Jude "The Law" Law, discord#124123, 100% A real person, shouldntadd.com';

$regex = '/[^A-Za-z0-9\s\-\(\)\"\+\&\%\@\#]/';

$tags = array_map('trim', explode(',', $raw));

$tags = array_filter($tags, function ($tag) use ($regex) {
    return !preg_match($regex, $tag);
});

var_dump($tags);

Yields:

array(9) {
    [0]=>
    string(10) "Adam-Smith"
    [1]=>
    string(9) "Christine"
    [2]=>
    string(5) "Roger"
    [3]=>
    string(16) "johnny@knoxville"
    [4]=>
    string(12) "some(person)"
    [5]=>
    string(10) "thing+asdf"
    [6]=>
    string(18) "Jude "The Law" Law"
    [7]=>
    string(14) "discord#124123"
    [8]=>
    string(18) "100% A real person"
  }

If you want to include a full stop as an allowable character (if you were checking for email addresses), you can add \. to the end of the regex.

Jacob Mulquin
  • 3,458
  • 1
  • 19
  • 22
0

This task can be completed more directly/efficiently than the earlier answers demonstrate. Just split on commas which may have leading or trailing spaces AND treat any names with non-whitelisted characters as delimiters too.

The result array will only contain the qualifying names and they will be whitespace trimmed without making any extra calls.

/ *, *|[^,]*[^, a-z\d()\-+&%@#][^,]*/i
#                                    ^- case-insensitive pattern
#      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^--- match names containing at least one non-whitelisted character
#     ^-------------------------------- OR
#^^^^^--------------------------------- optional leading spaces or trailing spaces around a comma

Code: (Demo)

var_export(
    preg_split(
        '/ *, *|[^,]*[^, a-z\d()\-+&%@#][^,]*/i',
        $tags,
        0,
        PREG_SPLIT_NO_EMPTY
    )
);
mickmackusa
  • 43,625
  • 12
  • 83
  • 136