0

I have some text that's wrapped in [quote][/quote] and I'm trying to match all text before those tags, everything in between those tags, and everything after those tags. The catch is that there may be multiple occurrences of them, but not within each other.

The reason for me doing this is because I want to run a filter on all the text outside of those tags whether there's multiple occurrences or not.

This is what I'm starting to work with:

preg_match_all("/(^.*)\[quote\](.*?)\[\/quote\](.*)/si", $reply['msg'], $getthequotes);

Here's the output:

Array
(
[0] => Array
    (
        [0] => putting some stuff before the quote
[quote][b]Logan said[/b][br]testing this youtube link http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA[br][br]did it work?[br][br][i]04/04/12 23:48:46: Edited by Logan(2)[/i][br][br][i]04/04/12 23:55:44: Edited by Logan(2)[/i][/quote]

yep

http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA

adding a quote

[quote][b]Logan said[/b][br]This is the start of the second quote http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA[br][br]did it work?[br][br][i]04/04/12 23:48:46: Edited by Logan(2)[/i][br][br][i]04/04/12 23:55:44: Edited by Logan(2)[/i][/quote]

[i]04/07/12 20:18:07: Edited by Logan(2)[/i]
    )

[1] => Array
    (
        [0] => putting some stuff before the quote

[quote][b]Logan said[/b][br]testing this youtube link http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA[br][br]did it work?[br][br][i]04/04/12 23:48:46: Edited by Logan(2)[/i][br][br][i]04/04/12 23:55:44: Edited by Logan(2)[/i][/quote]

yep

http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA

adding a quote


    )

[2] => Array
    (
        [0] => [b]Logan said[/b][br]This is the start of the second quote http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA[br][br]did it work?[br][br][i]04/04/12 23:48:46: Edited by Logan(2)[/i][br][br][i]04/04/12 23:55:44: Edited by Logan(2)[/i]
    )

[3] => Array
    (
        [0] => 

[i]04/07/12 20:18:07: Edited by Logan(2)[/i]
    )

)

As you can see it's not getting the desired output. Any help would be appreciated.

Ry-
  • 218,210
  • 55
  • 464
  • 476
Logan Best
  • 501
  • 3
  • 7
  • 21
  • Ahh... a markup language that isn't HTML -- surely regexes will finally be the correct tool? – Kerrek SB Apr 08 '12 at 00:37
  • I have custom bbcode like tags that get parsed into html. all regex parsing is done in PHP. – Logan Best Apr 08 '12 at 00:42
  • 1
    I'm sorry, I was being a bit sarcastic, in light of this [extremely prevalent fallacy](http://stackoverflow.com/a/1732454/596781). The answer is, *don't* use regexes for this, as they're not the right tool. – Kerrek SB Apr 08 '12 at 00:45
  • What might you suggest using? +1 on that link lol – Logan Best Apr 08 '12 at 00:52
  • You need a proper parser for your markup language. – Kerrek SB Apr 08 '12 at 00:56
  • The markup is literally text and 5 or 6 custom bbcode like tags that are being parsed already. I'm trying to write something before that parser takes place to modify any text outside of the `[quotes]` tags. If what I have that's been working for years isn't proper then what is? – Logan Best Apr 08 '12 at 00:59
  • 1
    It depends on your markup language. Does it allow nested tags? If not, perhaps regexes suffice, though you should first prove that you have a regular language (which shouldn't be hard). If yes, and if your language turns out not to be regular, you simply *cannot* use regexes to parse every possible valid markup correctly. – Kerrek SB Apr 08 '12 at 01:03
  • It's a regular language and pretty locked down to 6 specific tags. Anything else gets thrown out. It is only nested in a way that you can have `[b], [i], [url=something], [img=src], [color=hexcode]` inside of a `[quote][/quote]` or inside of each other. You cannot have a `[quote]` inside of another `[quote]`. It will get thrown out. Knowing that, I'm only capturing everything **before** `[quote]stuff[/quote]`, everything **between** `[quote][/quote]`, and everything **after** `[quote]stuff[/quote]`. – Logan Best Apr 08 '12 at 01:12

2 Answers2

1

I haven't tried this, but you only want the stuff before [quote] and after [/quote], you could do a strpos for the first occurrence of the opening quote tag. Now you know everything before is not quoted.

Next you can use strpos starting from the index of the first matching quote tag to find the closing quote tag. You can discard this stuff.

Now do another strpos for the next quote block using a starting position of the closing quote tag you just found. You can repeat this until you get to the end.

Gohn67
  • 10,608
  • 2
  • 29
  • 35
  • Also, if you *do* want nesting, search for the first `[/quote]` first, then search *backwards* from there for the opening `[quote]` -- this will give you the inner-most quote. Format it as necessary, then rinse and repeat. – mpen Apr 08 '12 at 02:43
  • I'll need all of it. I just need to do extra processing on non-quoted text. I guess I could do this though saving each in it's own var then concatenate all the parts back together again... kind of ass backwards, but I guess it would work. – Logan Best Apr 08 '12 at 03:06
  • Yes, it should work if you concatenate the parts back together. Sorry about that. Yeah it is kind of a naive algorithm, but it shouldn't be too slow for your purposes. I think I got this idea from the Udacity 101 class actually, where they parsed links in an html page using a similar approach. – Gohn67 Apr 08 '12 at 03:26
0

It can be done but you will need to make multiple passes over the string.

$string = 'putting some stuff before the quote
[quote][b]Logan said[/b][br]testing this youtube link http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA[br][br]did it work?[br][br][i]04/04/12 23:48:46: Edited by Logan(2)[/i][br][br][i]04/04/12 23:55:44: Edited by Logan(2)[/i][/quote]

yep

http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA

adding a quote

[quote][b]Logan said[/b][br]This is the start of the second quote http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA[br][br]did it work?[br][br][i]04/04/12 23:48:46: Edited by Logan(2)[/i][br][br][i]04/04/12 23:55:44: Edited by Logan(2)[/i][/quote]

[i]04/07/12 20:18:07: Edited by Logan(2)[/i]putting some stuff before the quote

[quote][b]Logan said[/b][br]testing this youtube link http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA[br][br]did it work?[br][br][i]04/04/12 23:48:46: Edited by Logan(2)[/i][br][br][i]04/04/12 23:55:44: Edited by Logan(2)[/i][/quote]

yep

http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA

adding a quote';

//get rid of whitespace
$string = preg_replace('%\s\s?%', " ",$string);
//break the string on a common element
$pieces =  preg_split('%\[%',$string);
//now discard the elements that are tags
foreach($pieces as $key=>$value):
    $value = trim($value);
    if(strrpos($value,"]") == (strlen($value) -1)):
        unset($pieces[$key]);
    endif;
endforeach;
print_r($pieces);
//and finally strip out the tag fragments
foreach($pieces as $key=>$value):
    $pieces[$key] = preg_replace('%.*]%',"",$value);
endforeach;

The result is an array that looks like this:

Array
(
    [0] => putting some stuff before the quote 
    [2] => Logan said
    [4] => testing this youtube link http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA
    [6] => did it work?
    [9] => 04/04/12 23:48:46: Edited by Logan(2)
    [13] => 04/04/12 23:55:44: Edited by Logan(2)
    [15] =>  yep http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA adding a quote 
    [17] => Logan said
    [19] => This is the start of the second quote http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA
    [21] => did it work?
    [24] => 04/04/12 23:48:46: Edited by Logan(2)
    [28] => 04/04/12 23:55:44: Edited by Logan(2)
    [31] => 04/07/12 20:18:07: Edited by Logan(2)
    [32] => putting some stuff before the quote 
    [34] => Logan said
    [36] => testing this youtube link http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA
    [38] => did it work?
    [41] => 04/04/12 23:48:46: Edited by Logan(2)
    [45] => 04/04/12 23:55:44: Edited by Logan(2)
    [47] =>  yep http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA adding a quote
)
Odyssey
  • 133
  • 1
  • 7