5

Right now, I am implementing this with a split, slice, and implosion:

$exploded = implode(' ',array_slice(preg_split('/(?=[A-Z])/','ThisIsATest'),1));
//$exploded = "This Is A Test"

Prettier version:

$capital_split = preg_split('/(?=[A-Z])/','ThisIsATest');
$blank_first_ignored = array_slice($capital_split,1);
$exploded = implode(' ',$blank_first_ignored);

However, the problem is when you have input like 'SometimesPDFFilesHappen', which my implementation would (incorrectly) interpret as 'Sometimes P D F Files Happen'.

How can I (simply) get my script to condense 'P D F' to 'PDF'?

My qualification for when it should split would be to start at the first capital, and end one before the last, to accommodate the next word.

Yes, I know there are some ambiguities, like in 'ThisIsAPDFTest', which would be interpreted as 'This Is APDF Test'. However, I can't think of a "smart" way to avoid this, so it is an acceptable compromise.

ThinkingStiff
  • 64,767
  • 30
  • 146
  • 239
Austin Hyde
  • 26,347
  • 28
  • 96
  • 129
  • See also: http://stackoverflow.com/questions/3103730/is-there-a-elegant-way-to-parse-a-word-and-add-spaces-before-capital-letters/3103795#3103795 – Peter Boughton Jul 18 '10 at 15:18

1 Answers1

10
$input = "SomePDFFile";

$pass1 = preg_replace("/([a-z])([A-Z])/","\\1 \\2",$input);
$pass2 = preg_replace("/([A-Z])([A-Z][a-z])/","\\1 \\2",$pass1);
echo $pass2;

or, if you are very religious about having 1 statement:

preg_replace("/(([a-z])([A-Z])|([A-Z])([A-Z][a-z]))/","\\2\\4 \\3\\5",$input);

which is very ugly.

mvds
  • 45,755
  • 8
  • 102
  • 111
  • 1
    +1 you saved me writing a question and then getting slapped for a duplicate. – zaf Oct 01 '10 at 15:01
  • ++; Thank you, this passed all my unit tests that other algorithms failed. I really need to learn regex. – Dolph Nov 14 '10 at 01:31
  • Added a unit test for my own purposes, and replaced the first argument on pass1 with "/([a-z])([A-Z][a-z])/" to prevent "FoO" from being replaced with "Fo O". – Dolph Nov 14 '10 at 01:53