2

I'm trying to figure out how to create a common newspaper device called a 'pullquote' in a Wordpress post. (But this isn't strictly a Wordpress question; it's more a generic Regex question.) I have a tag to surround the text in the post. I want to copy the text between the tags (which I know how to do) and the insert it between the 3rd and 4th instances of the p tags in the post.

The function below, finds the text and strips out the tags, but simply prepends the matched text to the beginning. I need help with targeting the 3rd/4th paragraph

OR... maybe I'm thinking about this wrong. Perhaps there is some way to target the elements as one can do with jQuery nth-child?

Post:

<p>If you wanna improve yer German, don't try to read Heine or some elevated crap... watch old episodes of [callout]Tatort or Bukow & Konig[/callout].</p>
<p>If I were teaching a music appreciation I wouldn't teach Beethoven. I'd teach Stamitz and average composers.</p>
<p>And here is a 3rd paragraph.</p>
<p>And here is a 4th paragraph.</p>

Desired Result

<p>If you wanna improve yer German, don't try to read Heine or some elevated crap... watch old episodes of Tatort or Bukow & Konig.</p>
<p>If I were teaching a music appreciation I wouldn't teach Beethoven. I'd teach Stamitz and average composers.</p>
<p>And here is a 3rd paragraph.</p>
<blockquote class="pullquote">Tatort or Bukow & Konig</blockquote>
<p>And here is a 4th paragraph.</p>

So far this is what I have for code:

function jchwebdev_pullquote( $content ) {
    $newcontent = $content;
    $replacement = '$1';
    $matches = array();
    $pattern = "~\[callout\](.*?)\[/callout\]~s";
    // strip out 'shortcode'
    $newcontent = preg_replace($pattern, $replacement, $content);
    if( preg_match($pattern, $content, $matches)) {
      // now have formatted pullquote 
      $pullquote = '<blockquote class="pullquote">' .$matches[1] . '</blockquote>';
      // now how do I target and insert $pullquote
      // between 3rd and 4th paragraph?
      preg_replace($3rd_4th_pattern, $3rd_4th_replacement,
      $newcontent);
      return $newcontent;
    }
    return $content;    
}
add_filter( 'the_content' , 'jchwebdev_pullquote');

Edit: I want to modify my question to be a bit more Wordpress specific. Wordpress actually converts newlines to

characters. Most Wordpress posts don't even -use- explicit 'p' tags because they are unneeded. The problem with the solutions so far is that they seem to strip out newline chars so if the post (source text) has newlines, it looks weird.

Typical real world Wordpress post:

If you wanna improve yer German, don't try to read Heine or some elevated crap... watch old episodes of [callout]Tatort or Bukow & Konig[/callout].

If I were teaching a music appreciation I wouldn't teach Beethoven. I'd teach Stamitz and average composers.

And here is a 3rd paragraph.


And here is a 5th paragraph.

Wordpress renders it like so:

<p>If you wanna improve yer German, don't try to read Heine or some elevated crap... watch old episodes of [callout]Tatort or Bukow & Konig[/callout].</p>
<p>If I were teaching a music appreciation I wouldn't teach Beethoven. I'd teach Stamitz and average composers.</p>
<p>And here is a 3rd paragraph.</p>
<p></p>
<p>And here is a 5th paragraph.</p>

So in a perfect world, I would like to take that 'typical real world post' and have preg_replace render it as :

If you wanna improve yer German, don't try to read Heine or some elevated crap... watch old episodes of Tatort or Bukow & Konig.

If I were teaching a music appreciation I wouldn't teach Beethoven. I'd teach Stamitz and average composers.

And here is a 3rd paragraph.

<blockquote class="callout">Tatort or Bukow & Konig</blockquote>

And here is a 5th paragraph.

...which Wordpress will then render as:

<p>If you wanna improve yer German, don't try to read Heine or some elevated crap... watch old episodes of Tatort or Bukow & Konig.</p>
<p>If I were teaching a music appreciation I wouldn't teach Beethoven. I'd teach Stamitz and average composers.</p>
<p>And here is a 3rd paragraph.</p>
<blockquote class="callout">Tatort or Bukow & Konig</blockquote>
<p>And here is a 5th paragraph.</p>

Maybe this has gotten too far off the mark and I should re-post in Wordpress forum, but I -think- all I need is a way to alter the preg_replace to use a newline char as a delimiter instead of the

and figure out how to -not- strip out those newline chars from the returned string.

THANKS FOR ALL THE HELP SO FAR!

jchwebdev
  • 5,034
  • 5
  • 21
  • 30

3 Answers3

1

You could do this in a single preg_replace function.

$re = "~^(?:(?!/p).)*<p>(?:(?!/p).)*\\[callout\\](.*?)\\[/callout\\].*?</p>(?:[^<>]*<p>.*?</p>){2}[^<]*\\K~s";
$str = "<p>If you wanna improve yer German, don't try to read Heine or some elevated crap... watch old episodes of [callout]Tatort or Bukow & Konig[/callout].</p>\n<p>If I were teaching a music appreciation I wouldn't teach Beethoven. I'd teach Stamitz and average composers.</p>\n<p>And here is a 3rd paragraph.</p>\n<p>And here is a 4th paragraph.</p>";
$subst = "<blockquote class=\"pullquote\">$1</blockquote>\n";
$result = preg_replace($re, $subst, $str);
echo $result;

DEMO

Code in eval

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • I tried to run that at regex101 and I couldn't get it to match. What am I missing? – jchwebdev Apr 05 '15 at 07:34
  • This doesn't work properly in Wordpress... but I think that is only because Wordpress isn't quite pure html (creates its own

    tags, etc.) Thanks, though.

    – jchwebdev Apr 05 '15 at 18:32
1

Simply using (.*?</p>){3}\K with s modifier, you can achieve what you want:

preg_replace("@(.*?</p>){3}\K@s", $pullquote, $content);

I made some changes in your function to work correctly:

function jchwebdev_pullquote( $content )
{
    $pattern = "~\[callout\](.*?)\[/callout\]~s";
    if(preg_match($pattern, $content, $matches))
    {
      $content = preg_replace($pattern, '$1', $content);
      $pullquote = '<blockquote class="pullquote">' .$matches[1] . '</blockquote>';
      $content = preg_replace("@(.*?</p>){3}\K@s", $pullquote, $content);
      return $content;
    }
    return $content;    
}

Regex live demo

PHP live demo

Update #1

Optimized: Using single preg_replace to avoid multiple patterns to be applied:

function jchwebdev_pullquote( $content )
{
    $pattern = "\[callout\](.*?)\[/callout\]";
    if(preg_match("@(?s)$pattern@", $content, $matches))
    {
      $content = preg_replace("@(?s)($pattern)((.*?</p>){3})@", '\2\3<blockquote class="pullquote">\2</blockquote>', $content);
      return $content;
    }
    return $content;
}

PHP live demo

Community
  • 1
  • 1
revo
  • 47,783
  • 14
  • 74
  • 117
  • @jchwebdev Oops, it was a mistake on using capturing groups. Now check it again. – revo Apr 05 '15 at 07:47
  • The Update #1 seems to work. There is still an issue, but I think that is more with how Wordpress creates its own paragraphs. Is it possible to change the preg_replace pattern from

    to simply a newline character?

    – jchwebdev Apr 05 '15 at 18:30
  • @jchwebdev Why you wanna change `

    ` to new line character? new line don't mean new paragraph.

    – revo Apr 05 '15 at 18:54
  • Wordpress (and other CMS like Drupal, Joomla) automatically render newlines as

    . IOW: you would rarely enter a

    tag in a Wordpress post because it's redundant.

    – jchwebdev Apr 05 '15 at 19:52
  • Just to be clear, the problem with your solution (at least in Wordpress) is that it strips away newline characters. See my edit above. This is probably my fault for not understanding Wordpress rendering. – jchwebdev Apr 05 '15 at 19:53
  • @jchwebdev You're using `the_content` filter which is applied when post is retrieved from database, and in your database you have paragraph tags not new lines. If you want to apply this method for new lines you should go with another filter that is hooked before saving to database. (and run it just once.) Another workaround is to disable `wpautop` filter so line breaks are kept as they are, and then you can apply `preg_replace` on new lines with changing above regex `(.*?){3}` part to `(.*?[\n]){3}` – revo Apr 06 '15 at 07:02
1

If you want to use PHP HTML/XML parsing, please refer to How do you parse and process HTML/XML in PHP?.

For a regex solution, here is a regex solution:

FIND: (?s)((?:<p>.*?<\/p>\s*){3})

This regex will just capture the first 3 <p> tags and then add a node after them.

REPLACE: $1<blockquote class="pullquote">Tatort or Bukow & Konig</blockquote>\n

Code:

$re = "/(?s)((?:<p>.*?<\\/p>\\s*){3})/"; 
$str = "<p>If you wanna improve yer German, don't try to read Heine or some elevated crap... watch old episodes of [callout]Tatort or Bukow & Konig[/callout].</p>\n<p>If I were teaching a music appreciation I wouldn't teach Beethoven. I'd teach Stamitz and average composers.</p>\n<p>And here is a 3rd paragraph.</p>\n<p>And here is a 4th paragraph.</p>"; 
$subst = "$1<blockquote class=\"pullquote\">Tatort or Bukow & Konig</blockquote>\n"; 
$result = preg_replace($re, $subst, $str, 1);

Demo is here.

Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563