2

I care a lot about capitalization (probably too much). So I wrote a function that fixes the capitalization everywhere on my site. I basically want "title case" but with some exceptions... words I don't like to see capitalized and acronyms.

function my_capitals($string)
{
    $uc = ucwords($string);
    $tokens = explode(' ',$uc);
    foreach ($tokens as $key=>$val)
    {
        if ($val == 'Ipa') $tokens[$key] = 'IPA';
        else if ($val == 'Ipas') $tokens[$key] = 'IPAs';
        else if ($val == 'Apa') $tokens[$key] = 'APA';
        else if ($val == 'Apas') $tokens[$key] = 'APAs';
        else if ($val == 'A') $tokens[$key] = 'a';
        else if ($val == 'And') $tokens[$key] = 'and';
        else if ($val == 'The') $tokens[$key] = 'the';
        else if ($val == 'In') $tokens[$key] = 'in';
        else if ($val == 'Or') $tokens[$key] = 'or';
        else if ($val == 'Of') $tokens[$key] = 'of';
        else if ($val == 'To') $tokens[$key] = 'to';
        else if ($val == 'On') $tokens[$key] = 'on';
        else if ($val == 'At') $tokens[$key] = 'at';
        else $tokens[$key] = $val;
    }
    $final = implode(' ',$tokens);
    return $final;
}

Imagine there might be another 10-15 options and that it may be run 3-5 times per page on relatively short strings (one-line descriptions and titles).

My question is this: Is this an efficient way to accomplish this sort of translation? Or should I be figuring out a more efficient way to do it? Is there another alternative I don't know of, as opposed to just switch which probably has similar performance?

JessycaFrederick
  • 408
  • 3
  • 10
  • Possible duplicate of [Which is Faster and better, Switch Case or if else if?](https://stackoverflow.com/questions/10773047/which-is-faster-and-better-switch-case-or-if-else-if) – Haroon Sep 02 '17 at 23:44
  • How about new line capitals ? You want those lowercased too ? – deg Sep 03 '17 at 00:26
  • @deg, I hadn't considered new line capitals... I'm not sure if those aren't caught by my current situation. – JessycaFrederick Sep 03 '17 at 01:54
  • 1
    @Haroon, I don't think so because I'm not just asking if switch is faster than if/else, I'm asking if there is a better way to do it that may include solutions other than switch. – JessycaFrederick Sep 03 '17 at 01:54
  • I agree it is no duplicate, the switch mention is incidental. @JessycaFrederick What I mean is if you have any sentence that begins with And, The and so on will be made lowercase by your function. – deg Sep 03 '17 at 01:56
  • Ah, good point. Except I don't believe in starting sentences with And either :) I'll be writing all of the content for the foreseeable future so hopefully I can control for that. – JessycaFrederick Sep 03 '17 at 02:34
  • 1
    It looks like you are trying to transform to 'Title Case'. You may not want to start a sentence with 'And', but many start with 'The'. There are many variations of title casing - different styles. Perhaps choose an existing library that suits. Related: https://writers.stackexchange.com/questions/4621/which-words-should-not-be-capitalized-in-title-case – Progrock Sep 03 '17 at 07:20
  • 1
    ok I agree now and I apologize for marking as duplicate – Haroon Sep 03 '17 at 15:37

3 Answers3

2

First of all, you have to consider what efficient means to you. Are you looking for

  • the shortest execution time
  • the smallest system impact (CPU, RAM, I/O ...)
  • the cleanest code (efficient coding)
  • the shortest code
  • ...

Second, given your details that ...

  1. there are around 30 search terms
  2. strings are short
  3. and code is fired up to 5 times

... unless you are using a toaster for your script, neither execution time nor system impact will give you by any means any kind of headache.

So it's up to a clean code actually. Therefore, you should get known to the difference of == and ===. Next, you already have a string which is searchable for several string specific functions : $uc.

So what about str_replace? It accepts arrays as input.

function my_capitals($string)
{
    $uc = " ".ucwords($string)." ";
    $search = [' Ipa ', ' Ipas ', ' A ', ' Bändy '];
    $replacements = [' IPA ', ' IPAs ', ' a ', ' Cändy '];
    return ucfirst(trim(str_replace($search, $replacements, $uc)));
}

You can even shorten that to 1 line:

function my_capitals($string)
{
    return ucfirst(trim(str_replace([' Ipa ', ' Ipas ', ' A ', ' Bändy '], [' IPA ', ' IPAs ', ' a ', ' Cändy '], " ".ucwords($string)." ")));
}

Just make sure $search and $replacements do contain an equal amount of elements and both have leading and trailing whitespaces.

Allocating your arrays just once will improve speed on consecutive calls.

function my_capitals1a($string, $searcher, $replacement)
{
    return ucfirst(trim(str_replace($searcher,$replacement, " ".ucwords($string)." ")));
}

Demo and speed comparison: http://sandbox.onlinephpfunctions.com/code/fd594ab47b78778981dc0a58432e141f48f9b6e7

Edit: word safe replacement Edit2: speed comparison Edit3: improved with hints from @Pogrock

rndus2r
  • 496
  • 4
  • 17
  • 1
    You may need trailing spaces on those. e.g. 'Andy'. – Progrock Sep 03 '17 at 07:09
  • Thank you for providing some framework to think about things. I'm specifically interested in the shortest execution time. I'm the only one reading the code and while I comment liberally for my own future sanity, clean/short code are not the highest priorities for me. I previously reviewed == vs ===. I first came to understand the difference is type and since these will all be strings, so I thought I was good with ==. Turns out === is faster, so thanks!!). I will explore str_replace efficiency. Thanks! – JessycaFrederick Sep 03 '17 at 15:23
  • @Progrock thanks for pointing that out, I've updated the answer – rndus2r Sep 03 '17 at 17:36
  • @rndus2r, of course other punctuation like hyphens and commas, and final words in the sentence may also pose a problem. – Progrock Sep 03 '17 at 17:39
  • @Progrock well that's a good point, but if you look at his example he is not catching "Andy," nor "Ändy". OP has to point out if sanitizing should be done within this function or if he already has done it outside. – rndus2r Sep 03 '17 at 17:54
  • But Andy should not change to andy? Only a single A should change to a according to the example given from OP. – rndus2r Sep 03 '17 at 18:15
  • @rndus2r, sanitizing happens when these strings go into the database, but admittedly I haven't yet sorted out how to manage umlauts in the system. I haven't seen any other alternate characters show up. I'm think I'm covered on proper nouns since I'm first capitalizing and then modifying special cases. Also, I'm a "she" not a "he." – JessycaFrederick Sep 03 '17 at 18:28
  • 1
    Pardon :) As long as you define your custom search and replacements, you're free to go. I have added a speed comparison to the answer and refined it a bit. It's done with 700.000 rounds which is around 140.000 to 233.333 page calls in your example. – rndus2r Sep 03 '17 at 19:29
  • @rndus2r, thank you so much for the speed test! I was looking at that site earlier and was thinking about setting something up!! Very interesting results. – JessycaFrederick Sep 04 '17 at 03:26
0

I would do it this way:

   function my_capitals($string)
    {
        $uc = ucwords($string);
        $tokens = explode(' ',$uc);
        $excepsions = ['Ipa'=>'IPA','Ipas'=>'IPAs'];
        foreach ($tokens as $key=>$val)
        {
            if(isset($excepsions[$val])){
                $tokens[$key] = $excepsions[$val];
            }
        }
        $final = implode(' ',$tokens);
        return $final;
    }
Andriy Lozynskiy
  • 2,444
  • 2
  • 17
  • 35
0

A slight variation of the OP's code. Upper case all words in the input title, and then swap words like And for and. Finally initialise the beginning of the title.

You could canonicalise your title first with string to lower, but this could break names such as O'Hagan.

There are many other edge cases not covered, and you may be better to trade in speed for a full featured library.

<?php
$replacements = [
        'Ipas' => 'IPAs',
        'Apa'  => 'APA',
        'Apas' => 'APAs',
        'A'    => 'a',
        'And'  => 'and',
        'The'  => 'the',
        'In'   => 'in',
        'Or'   => 'or',
        'Of'   => 'of',
        'To'   => 'to',
        'On'   => 'on',
        'At'   => 'at',
];

$test_titles = [
    'a tale of two cities'    => 'A Tale of Two Cities', 
    'the secret history'      => 'The Secret History',
    'lord of the flies'       => 'Lord of the Flies',
    'The woman in white'      => 'The Woman in White',
    'of mice and men'         => 'Of Mice and Men',
    'the andy warhol diaries' => 'The Andy Warhol Diaries'
];

foreach($test_titles as $input => $title_cased) {
    $words = [];
    foreach(explode(' ', ucwords($input)) as $word) {
        $words[] = isset($replacements[$word]) ? $replacements[$word] : $word;
    }
    $transformed = ucfirst(implode(' ', $words));
    assert($transformed === $title_cased);
}
Progrock
  • 7,373
  • 1
  • 19
  • 25
  • Wheres the difference to @Andriy Lozynskiy answer? – rndus2r Sep 03 '17 at 19:53
  • @rndus2r where do I start? Their function has a syntax error, and doesn't work even if corrected. – Progrock Sep 03 '17 at 20:06
  • It is just missing a ) and it works correctly : http://sandbox.onlinephpfunctions.com/code/b84dd027760e073c147da0a4f0831ed851d2fd71 – rndus2r Sep 03 '17 at 20:09
  • @rndus2r, similar granted once fixed and fleshed out. but no `ucfirst` at the end. Which is a slight improvement here. Blindly wrote this, happy to combine/ditch. – Progrock Sep 03 '17 at 20:20
  • Yea, i've adapted your improvements aswell into my answer and poked him aswell – rndus2r Sep 03 '17 at 20:27