14

I am trying to determine a method to uppercase a surname; however, excluding the lowercase prefix.

Example of names and their conversion:

  • MacArthur -> MacARTHUR
  • McDavid -> McDAVID
  • LeBlanc -> LeBLANC
  • McIntyre -> McINTYRE
  • de Wit -> de WIT

There are also names that would contain the surnames that would need to be fully capitalized, so a simple function to identify the prefix such as strchr()would not suffice:

  • Macmaster -> MACMASTER
  • Macintosh -> MACINTOSH

The PHP function mb_strtoupper() is not appropriate, as it capitalizes the complete string. Similarly strtoupper() is not appropriate, and loses accents on accented names as well.

There are some answers around SO that partly answer the question, such as : Capitalization using PHP However, the common shortfall is assuming that all names with a surname as as Mac are followed with a capital.

The names are capitalized properly in the database, so we can assume that a name spelled as Macarthur is correct, or MacArthur is correct for another person.

Community
  • 1
  • 1
JimmyBanks
  • 4,178
  • 8
  • 45
  • 72
  • is there any rule after which surname start's like space or anything – Abhishek Jan 20 '17 at 15:25
  • @Abhishek What do you mean? – JimmyBanks Jan 20 '17 at 15:27
  • I mean is there any logic which splits first name and surname – Abhishek Jan 20 '17 at 15:28
  • 6
    Without being able to express a rule *in words* of how names should be capitalised, nobody will be able to write any code that actually does it. Could the rule be *"everything after the __last__ capital letter should be capitalised"*? Does that fit all sample names you have? – deceze Jan 20 '17 at 15:29
  • @Fred-ii- Poor choice of words, the names are being acquired from the db, but I am looking to set them to uppercase using PHP, i meant simply using a function such as `strchr()` to identify prefixes wouldn't work – JimmyBanks Jan 20 '17 at 15:32
  • @JimmyBanks Sorry, I deleted my comment asking if it was db related; I only noticed it after seeing *"The names are capitalized properly in the database"*. – Funk Forty Niner Jan 20 '17 at 15:34
  • @deceze Yes, that is correct, the rule in words would be to uppercase the string after the last capital letter, which I now only identify after you have pointed out – JimmyBanks Jan 20 '17 at 15:35
  • Check this comment on the PHP website, there is a user function that might suit your needs http://php.net/manual/es/function.mb-convert-case.php#92317 – Gonzalingui Oct 25 '17 at 21:43
  • See https://stackoverflow.com/questions/1122328/first-name-middle-name-last-name-why-not-full-name – Raedwald Oct 01 '19 at 10:33
  • [Assuming that all people have surnames is incorrect](https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/). – Raedwald Oct 01 '19 at 10:34

7 Answers7

8

Going with the rule to capitalise everything after the last capital letter:

preg_replace_callback('/\p{Lu}\p{Ll}+$/u', 
                      function ($m) { return mb_strtoupper($m[0]); },
                      $name)

\p{Lu} and \p{Ll} are Unicode upper and lower case characters respectively, and mb_strtoupper is unicode aware… for a simple ASCII-only variant this would do too:

preg_replace_callback('/[A-Z][a-z]+$/', 
                      function ($m) { return strtoupper($m[0]); },
                      $name)
deceze
  • 510,633
  • 85
  • 743
  • 889
3

Here's a basic algorithm that avoids cryptic regular expressions:

  1. Create a multibyte-safe character array for the literal surname (as it exists in the database).
  2. Create a second character array in multibyte-safe capitalized form.
  3. Intersect both arrays to determine the index of the final capitalized character.
  4. Concatenate the literal surname through the index with the capitalized form after the index.

In code form:

<?php
$names = [
    'MacArthur',
    'McDavid',
    'LeBlanc',
    'McIntyre',
    'de Wit',
    'Macmaster',
    'Macintosh',
    'MacMac',
    'die Über',
    'Van der Beek',
    'johnson',
    'Lindström',
    'Cehlárik',
];

// Uppercase after the last capital letter
function normalizeSurname($name) {
    // Split surname into a Unicode character array
    $chars = preg_split('//u', $name, -1, PREG_SPLIT_NO_EMPTY);

    // Capitalize surname and split into a character array
    $name_upper = mb_convert_case($name, MB_CASE_UPPER);
    $chars_upper = preg_split('//u', $name_upper, -1, PREG_SPLIT_NO_EMPTY);

    // Find the index of the last capitalize letter
    @$last_capital_idx = array_slice(array_keys(array_intersect($chars, $chars_upper)), -1)[0] ?: 0;

    // Concatenate the literal surname up to the index, and capitalized surname thereafter
    return mb_substr($name, 0, $last_capital_idx) . mb_substr($name_upper, $last_capital_idx);
}

// Loop through the surnames and display in normalized form
foreach($names as $name) {
    echo sprintf("%s -> %s\n", 
        $name,
        normalizeSurname($name)
    );
}

You'll get output like:

MacArthur -> MacARTHUR
McDavid -> McDAVID
LeBlanc -> LeBLANC
McIntyre -> McINTYRE
de Wit -> de WIT
Macmaster -> MACMASTER
Macintosh -> MACINTOSH
MacMac -> MacMAC
die Über -> die ÜBER
Van der Beek -> Van der BEEK
johnson -> JOHNSON
Lindström -> LINDSTRÖM
Cehlárik -> CEHLÁRIK

This makes the assumption that an entirely lowercase surname should be capitalized. It would be easy to change that behavior.

Jeff Standen
  • 6,670
  • 1
  • 17
  • 18
  • 1
    Interesting solution, but it certainly looks *more* cryptic than a regular expression :) – mr_carrera Oct 29 '17 at 15:55
  • 1
    Sure, for something this straightforward where the data is clean, I'd agree. I think [deceze's regex answer](https://stackoverflow.com/a/41767621/321872) is elegant under these conditions. In practice, data may be untrustworthy (user-provided), and the rules may develop edge cases over time. For instance, double-barrelled surnames with mixed capitalization, languages that lack capitalization, etc. That's asking for a regex nightmare, where an approach like mine is more adaptable. – Jeff Standen Oct 30 '17 at 05:45
2

I believe this is the solution to question:

$names = array(
    'MacArthur',
    'Macarthur',
    'ÜtaTest',
    'de Wit'
);

$pattern = '~(?<prefix>(?:\p{Lu}.+|.+\s+))(?<suffix>\p{Lu}.*)~';
foreach ($names as $key => $name) {
    if (preg_match($pattern, $name, $matches)) {
        $names[$key] = $matches['prefix'] . mb_strtoupper($matches['suffix']);
    } else {
        $names[$key] = mb_strtoupper($name);
    }
}

print_r($names);

it produces following result for the input array above:

Array
(
    [0] => MacARTHUR
    [1] => MACARTHUR
    [2] => ÜtaTEST
    [3] => de WIT
)

Brief explanation of regular expression:

(?<prefix>             # name of the captured group
   (?:                 # ignore this group
       \p{Lu}.+        # any uppercase character followed by any character
       |               # OR
       .+\s+           # any character followed by white space
   )
)
(?<suffix>             # name of the captured group
    \p{Lu}.*           # any uppercase character followed by any character
)
ioseb
  • 16,625
  • 3
  • 33
  • 29
  • Small problem with this answer, names such as Lindström and Cehlárik output as LindstrÖM and CehlÁRIK, respectively – JimmyBanks Oct 10 '17 at 00:33
1
  $string = "McBain";
  preg_match('/([A-Z][a-z]+\h*)$/', $string, $matches);
  /** 
   Added qualifier for if no match found
   **/
  if(!empty($matches[1])){
      // $upperString = str_replace($matches[1], strtoupper($matches[1]),$string);
      // replace only last occurance of string:
      $pos = strrpos($string, $matches[1]);
     if($pos !== false)
         {
         $upperString = substr_replace($string, strtoupper($matches[1]), $pos, strlen($matches[1]));
          }
  }
  else {
      $upperString = strtoupper($string);
  }
  print $upperString;

Example Output:

$string = "McBain ";
$upperString = "McBAIN";

$string = "Mac Hartin";
$upperString = "Mac HARTIN";

$string = "Macaroni ";
$upperString = "MACARONI";

$string = "jacaroni";
$upperString = "JACARONI";

$string = "MacMac";
$upperString = "MacMAC";

( Also added a \h* to the regex to catch any whitespace. )

reference for find/replace last occurance.

Community
  • 1
  • 1
Martin
  • 22,212
  • 11
  • 70
  • 132
  • Why does this get a -1 ? – Martin Jan 20 '17 at 15:43
  • This fails for (arguably contrived) names like "MacMac". – deceze Jan 20 '17 at 16:07
  • @deceze jeeez, yeah. Ok, I fixed this now. Looks somewhat less neat and tidy though, but there we go. – Martin Jan 20 '17 at 16:21
  • Now it fails even worse. Not sure why you're bothering with a separate `str_replace`/`strpos` when you're already using regex matching… – deceze Jan 20 '17 at 16:23
  • ahh, I typo'ed `$upperString` in the wrong place. Fixed. @deceze – Martin Jan 20 '17 at 16:28
  • 1
    @deceze `str_replace` was originally quick and easy but I hadn't considered the edge cases of string repetition as you pointed out. Using a full regex-only version would probably be smoother with that issue in mind. `:-/` – Martin Jan 20 '17 at 16:30
0
<?php
$string = "MacArthur";
$count = 0;
$finished = "";
$chars = str_split($string);
foreach($chars as $char){
    if(ctype_upper($char)){
        $count++;
    }
        if($count == 2){
          $finished .= strtoupper($char); 
        }
         else{
          $finished .= $char;  
            } 
} 
echo $finished; 
Mark James
  • 51
  • 1
  • 3
0

Here is the code to uppercase all symbols after a last uppercase in the string.

preg_replace_callback('/[A-Z][^A-Z]+$/', function($match) {
  return strtoupper($match[0]);
}, $str);

Try it with test examples from your question: https://repl.it/NYcR/5

Shmygol
  • 913
  • 7
  • 16
0

Just to differ from the rest of the answers you could try something like this.

$names = array(
    'MacArthur',
    'Macarthur',
    'ÜtaTest',
    'de Wit'
);
function fixSurnameA($item) {
$lname = mb_strtolower($item);
$nameArrayA = str_split($item,1);
$nameArrayB = str_split($lname,1);
$result = array_diff($nameArrayA, $nameArrayB);
$keys = array_keys($result);
$key = max($keys);
if(count($keys)>=2 or (count($keys)==1 and $key>0)) {
$pre = substr($item, 0, $key);
$suf = mb_strtoupper(substr($item, $key));
echo $pre.$suf."\n";
} else {
 echo $item."\n";
}
}
function fixSurnameB($item) {
$lname = mb_strtolower($item);
$nameArrayA = str_split($item,1);
$nameArrayB = str_split($lname,1);
$result = array_diff($nameArrayA, $nameArrayB);
$keys = array_keys($result);
$key = max($keys);
$pre = substr($item, 0, $key);
$suf = mb_strtoupper(substr($item, $key));
echo $pre.$suf."\n";
}

array_walk($names,'fixSurnameA');
/* MacARTHUR
   Macarthur
   ÜtaTEST
   de WIT 
*/
array_walk($names,'fixSurnameB');
/* MacARTHUR
   MACARTHUR
   ÜtaTEST
   de WIT 
*/

Test this on PHP SandBox