2

I am coding a tag system for a custom CMS built with Codeigniter and I'm trying to enforce a particular format.

Basically, I need the first letter of each word to be capitalized with the exception of the following, which should be lowercase:

  • Articles: a, an, the
  • Coordinating Conjunctions: and, but, or, for, nor, etc.
  • Prepositions (fewer than five letters): with, on, at, to, from, by, etc.

Furthermore, if the tag starts with one of the above, it should be capitalized.

Some examples of properly formatted tags:

  • Game of Thrones
  • Of Mice and Men
  • From First to Last
  • Lord of the Rings
  • Need for Speed

So far I just have:

$tag = 'Lord of the Rings';
$tag = ucwords($tag); 

$patterns = array('/A/', '/An/', '/The/', '/And/', '/Of/', '/But/', '/Or/', '/For/', '/Nor/', '/With/', '/On/', '/At/', '/To/', '/From/', '/By/' );
$lowercase = array('a', 'an', 'the', 'and', 'of', 'but', 'or', 'for', 'nor', 'with', 'on', 'at', 'to', 'from', 'by' );

$formatted_tag = preg_replace($patterns, $lowercase, $tag);

// capitalize first letter of string
$formatted_tag = ucfirst($formatted_tag);

echo $formatted_tag;

This produces the correct result of Lord of the Rings, but how can I avoid duplicating the arrays? It's tedious matching them up when I add new words.

I'm sure there are some words that should be included that I'm missing, are there any existing functions or classes that I can use?

brasofilo
  • 25,496
  • 15
  • 91
  • 179
Motive
  • 3,071
  • 9
  • 40
  • 63

1 Answers1

8

You don't need the $lowercase array if you use a custom callback with preg_replace_callback(). Also, your current method needs word boundaries, otherwise it'll replace Android with android or bAnd with band. Finally, creating N number of regexes for N words is inefficient and not necessary, as this can be done with one single regex.

I would just keep a words array:

$words = array('A', 'An', 'The', 'And', 'Of', 'But', 'Or', 'For', 'Nor', 'With', 'On', 'At', 'To', 'From', 'By' );

And create one dynamic regex, complete with word boundaries, like so:

$regex = '/\b(' . implode( '|', $words) . ')\b/i';

And now replace all of the matches with their lowercase counterpart:

$formatted_tag = preg_replace_callback( $regex, function( $matches) {
    return strtolower( $matches[1]);
}, $tag);
nickb
  • 59,313
  • 13
  • 108
  • 143
  • Awesome, didn't know about _callback. One problem I'm noticing is if someone types Lord of THE Rings, the 'THE' stays in all caps. I considered just making the entire string lowercase before the ucwords(), but I don't want to lose all of the capitalization in cases like WoW (for World of Warcraft) where Wow wouldn't make sense. How can I change it to be case insensitive? – Motive Aug 08 '12 at 18:57
  • @MotiveKyle - That's simple, add the `/i` modifier to the regex: `'/\b(' . implode( '|', $words) . ')\b/i';` I edited it into my answer. – nickb Aug 08 '12 at 18:59