0

Out of a MySQL database I have a list of names like

  • Smith
  • Frank
  • Dent MD
  • Smith Sr.
  • Jones, jr.
  • Smith-Jones
  • O'Toole

I need to get that list to be

  • smith
  • frank
  • dent
  • smith
  • jones
  • smith-jones
  • otoole

By format I mean I only want the "main" part of the last name eliminating any

  • non-alphanumeric characters
  • spaces
  • Titles (jr, sr, MD, etc...)

I realize in some cases this is "changing" the person's name but it's not being used in any way that they see.

Right now I am doing something like:

$toReplace  = array('.', ',', '-', ' jr', ' sr', ' MD', ' DO', "'", ' ');
//For each result from my query
    $lname = str_replace($toReplace, '', $row_rsgetUsers['lname']);
    $lname = strtolower($lname);

Then, after awhile a name shows up like Wright CISA so I then have to update my $toReplace array to account for that. (I have no control over the input of the names)

Is that the best way to go about doing this or is there a better way/library out there I should be using that eliminates the need for me to manually update my $toReplace array occasionally?

Jason
  • 15,017
  • 23
  • 85
  • 116
  • 2
    First of all, define "standardise". What is the "standard"? – deceze Mar 24 '14 at 13:40
  • 1
    personally I'd use a regexp, which matches the part you're looking for: `#^([a-z'-])+#i` then str_replace the single-quote and lowercase the result. – Tularis Mar 24 '14 at 13:41
  • What if "jr" or "sr" is inside person's name? This will fail – Alma Do Mar 24 '14 at 13:42
  • 1
    The person named 'O'Toole' might not be very happy about changing his name to 'otoole'. – Vatev Mar 24 '14 at 13:42
  • Note, the replacement of `'-'` will end up changing "Smith-Jones" into "smithjones". It might not be worth the trouble to correct that while still replacing `'-'` elsewhere. – cHao Mar 24 '14 at 14:06
  • @AlmaDo - You're correct - I've updated my example – Jason Mar 24 '14 at 14:34
  • @deceze - Apologies, I changed the title and question a bit - "format" might be a better term. I've updated the question with how I'm looking to format the name. – Jason Mar 24 '14 at 14:36

2 Answers2

0

This should do what you asked. Though you could've find it yourself here at stack: Replace all characters except letters, numbers, spaces and underscores How do I remove everything after a space in PHP?

<?php
    $user_list = array('Smith', 'Frank', 'Dent MD', 'Smith Sr.', 'Jones, jr.', 'Smith-Jones', "O'Toole");
    print_r($user_list);
    $string = "O'Toole";
    $temp_array = array();
    foreach($user_list as $user_string){
        //remove special chars and convert Upper case letters to lower case
        $string = strtolower(preg_replace("/[^ \w]+/", "", $user_string));
        //remove everything after space, case example: Jones, jr.
        $string = explode(' ', $string);
        $string = $string[0];
        //assign "standartized?" string 
        $temp_array[] = $string;
    }
    print_r($temp_array);
?>

Array
(
    [0] => Smith
    [1] => Frank
    [2] => Dent MD
    [3] => Smith Sr.
    [4] => Jones, jr.
    [5] => Smith-Jones
    [6] => O'Toole
)
Array
(
    [0] => smith
    [1] => frank
    [2] => dent
    [3] => smith
    [4] => jones
    [5] => smithjones
    [6] => otoole
)
Community
  • 1
  • 1
CrazySabbath
  • 1,274
  • 3
  • 11
  • 33
  • Except for a situation where the last name is two names separated by a space this does the trick. – Jason Mar 24 '14 at 20:10
0

This is a function that will replace all non-ASCII characters with ASCII characters, make a string lowercase, etc ( The Source CookBook ).

$names = array( 'Smith', 'Frank', 'Dent MD', 'Smith Sr.', 'Jones, jr.', 'Smith-Jones', "O'Toole" );

foreach( $names as &$value ) {
    $value = slugify( $value );
}

print_r( $names );

function slugify( $text ) {

    // replace non letter or digits by -
    $text = preg_replace('~[^\\pL\d]+~u', '-', $text);  
    $text = trim($text, '-');

    /**
     * //IGNORE//TRANSLIT to avoid errors on non translatable characters and still translate other characters
     * //TRANSLIT to out_charset transliteration is activated
     * //IGNORE, characters that cannot be represented in the target charset are silently discarded
    */
    $text = iconv('utf-8', 'ASCII//IGNORE//TRANSLIT', $text);   
    $text = strtolower(trim($text));

    // remove unwanted characters
    $text = preg_replace('~[^-\w]+~', '', $text);

    return empty($text) ? '' : $text ;
}

Output:

Array
(
    [0] => smith
    [1] => frank
    [2] => dent-md
    [3] => smith-sr
    [4] => jones-jr
    [5] => smith-jones
    [6] => o-toole
)
Danijel
  • 12,408
  • 5
  • 38
  • 54
  • This would work except for the fact that I need to eliminate the -md, -sr, -jr, etc. – Jason Mar 24 '14 at 20:11
  • If that 'slugs' are used internally, then should not be a problem if they have some titles, what if a person has two titles, or if the titles are before the person's name. There are all sort of combinations to worry about. – Danijel Mar 25 '14 at 10:38