0

Consider the following string:

LoReM {FOO} IPSUM dolor {BAR} Samet {fooBar}

I'm looking for a way to lowercase everything - except what is between {brackets} should be ignored. So the desired output is:

lorem {FOO} ipsum dolor {BAR} samet {fooBar}

In another topic @stema pointed to http://de2.php.net/manual/en/functions.anonymous.php to achieve something like this, but I dont understand how:

echo preg_replace_callback('~\{.*?\}~', function ($match) {
  return strtolower($match[1]);
}, 'LoReM {FOO} IPSUM dolor {BAR} Samet {fooBar}');

This returns only the string without the bracketed {tags}, and not even lowercased. Who can help me solve this? Any help is greatly appreciated :)

Pr0no
  • 3,910
  • 21
  • 74
  • 121
  • There is no `$match[1]` with your pattern (only 0). Next to that you don't want to lowercase the match but everything else. – hakre Feb 10 '12 at 14:15
  • possible duplicate of [How to let regex ignore everything between brackets?](http://stackoverflow.com/questions/9219072/how-to-let-regex-ignore-everything-between-brackets) – hakre Feb 13 '12 at 13:16
  • Please do not duplicate questions. – hakre Feb 13 '12 at 13:23

7 Answers7

3

Change your regex to:

~(?:^|})(.*?)(?:\{|$)~

explanation:

~           : delimiter
  (?:       : start non capture group
    ^|}     : begin of string or }
  )         : end of group
  (         : start capture group #1
    .*?     : any number of any char. non greedy
              (ie: all char outside of {})
  )         : end of group
  (?:       : start non capture group
    \{|$    : { or end of string
  )         : end of group
~           : delimiter
Toto
  • 89,455
  • 62
  • 89
  • 125
3

Your expression must catch the other parts:

echo preg_replace_callback('~^.*?{|}.*?{|}.*?$}~', function ($match) {
  return strtolower($match[0]);
}, 'LoReM {FOO} IPSUM dolor {BAR} Samet {fooBar}');
pät
  • 543
  • 3
  • 9
2

Using preg_replace_callback() is probably the best method. You just need to fix the regular expression to be this instead:

~(^|\})(.*?)(\{|$)~

And then return this:

return $match[1] . strtolower($match[2]) . $match[3];
FtDRbwLXw6
  • 27,774
  • 13
  • 70
  • 107
  • @hakre: It handles that case just fine? – FtDRbwLXw6 Feb 10 '12 at 14:21
  • @drrcknsln: Not if you actually use upper-case letters in the original string: `Tree } Makes Me Wonder {FOO} { Sick String` - http://codepad.viper-7.com/x1Kihk – hakre Feb 10 '12 at 14:27
  • @hakre: Ah, I think I see what you meant now. It doesn't handle `^([^{]*)\}` and `\{([^}]*)$` cases properly. – FtDRbwLXw6 Feb 10 '12 at 14:28
  • drrcknlsn What black magic this is! Could you explain to me how it works? I don't understand how only $match[2] is lowercased, but then the words are being put in the right places back into the string? @hakre - such a case will never happen. I can say with certainty that all {tags} have an opening and closing bracket (with no brackets therein). – Pr0no Feb 10 '12 at 14:29
  • @Reveller: Basically, this regex ignores what's between `{` and `}`. We aren't concerned with replacing that, because we want to leave it as-is. Instead we find/replace the text that's between `}` and `{`, or beginning of line and `{`, or `}` and end of line. In other words, we leave the `{tags}` alone, and match everything around them. – FtDRbwLXw6 Feb 10 '12 at 14:40
2

You want to match all characters except those within {}. Then replace the match with an strtolower of it.

To do so, you need to create a pattern that matches everything but the bracket-pairs:

~(?:{\w+}(*SKIP)(*FAIL))|[^{}]+~

This will skip (and drop) all bracket pairs but match everything else that is not a bracket character ({ or }. You can then just lowercase the match using your callback function:

$str = '{LoReM {FOO} IPSUM { dolor {BAR} Samet {fooBar} Tou}Louse';

$out = preg_replace_callback('~(?:{\w+}(*SKIP)(*FAIL))|[^{}]+~', function($m)
    {return strtolower($m[0]);}, $str)
    ;

echo $out;

Demo, Output:

{lorem {FOO} ipsum { dolor {BAR} samet {fooBar} tou}louse

As the example shows, non-associated brackets aren't a burden. This pattern also specifies how the bracket pairs should be written, \w stands for any word character, you can replace it with any character-class that full-fills your needs if it's not fitting (e.g. in your duplicate question).

This is actually pretty similar to a question that has already been answered: How to let regex ignore everything between brackets? - it's practically an exact duplicate which I now saw after answering more detailed.

Community
  • 1
  • 1
hakre
  • 193,403
  • 52
  • 435
  • 836
1

How about this.

$input = 'LoReM {FOO} IPSUM dolor {BAR} Samet {fooBar}';
preg_match_all('~\{.*?\}~', $input, $matches);
$output = strtolower($input);
foreach ($matches[0] as $match) {
  $output = str_replace(strtolower($match), $match, $output);
}
Brendan
  • 3,415
  • 24
  • 26
1

You can use preg_replace() with the PREG_REPLACE_EVAL modifier as in:

$string  = 'LoReM {FOO} IPSUM dolor {BAR} Samet {fooBar}';
$pattern = '/(?<![[:word:]{])[[:word:]]*?(?![[:word:]}])/e';
echo preg_replace($pattern, 'strtolower($0)', $string);

Everything that the pattern matches is then replaced by evaluating strtolower() on the match. If you want to understand the regex it's easiest to start in the middle, (I've separated the blocks with spaces for readability)

(?<![[:word:]{]) [[:word:]]*? (?![[:word:]}])
^                ^            ^
|                |            |
|                +-- match any amount of word characters (alphanums)
|                             |
+-- that are not preceded by a word character or {
                              |
                              +-- and are not followed by a word character or }

Where word characters are alphanumeric characters and underscores.
nachito
  • 6,975
  • 2
  • 25
  • 44
0

This is the type of problem that a REGEX has a lot of trouble with. A better solution would be to write a parser that reads character by charcter and can switch state.

  • Start in lowercase mode. Output each read character in lower case.
  • If a { character is read in lowercase mode, switch to uppercase mode.
  • if a } character is read in uppercase mode, switch to lowercase mode.

Keep in mind that it will be more complicated if you want to handle nested braces.

Swiss
  • 5,556
  • 1
  • 28
  • 42