-1

Good day!

I have a massive magazine website I’ve just migrated from Divi to X Pro. Inside every post there's a sugestion to another post. And there’s a bit more than +10,000 posts in total, so this is not something editors can fix manually in every post. This element was added within the post content.

<blockquote>
<h3>Te sugerimos</h3>
<p class="entry-title"><a href="https://example.com/post-title/" target="_blank" rel="noopener noreferrer" style="outline: none;"><strong>POST TITLE</strong></a></p>
</blockquote>

It should be just an h3 tag, and then a p tag without that entry-title class, and ofcourse, without that blockquote tag.

That code is just part of the posts. Back in the old Divi website, editors wrote posts normally using native WP WYSIWYG editor. It was Divi, for reasons I don’t know, that apply all this… styles? Anyway, everything passed on to this X-Pro-based website once I did the migration.

Here I check every post in WP WYSIWYG and they seems normal, and when I see any article online it has those big chunk of text. And it’s when I check HTML tab in post editor that I see all that garbage code.

In order to get rid of all that, I'm thinking about using REGEX, but honestly, I have no idea how to tell REGEX to delete every class="entry-title" from a p tag which is inside a blockquote tag, which I would delete too but only if it has all those elements inside.

This would be a life saver. I'm going crazy here.

Thanks in advance!

Alain
  • 213
  • 3
  • 9

1 Answers1

1

Let's define the matching regular expression (PCRE-compatible), first of all:

~<blockquote>\s*(.+?)<p class="entry-title">(.+?)<\/blockquote>~s

See live at RegExr; click "explain" to understand the expression. Then our replacement:

\1<p>\2

Then, here's a test block with added surrounding content:

<blockquote>
<h3>Te sugerimos</h3>
<p class="entry-title"><a href="https://example.com/post-title/" target="_blank" rel="noopener noreferrer" style="outline: none;"><strong>POST TITLE</strong></a></p>
</blockquote>
<p>Other stuff</p>
<blockquote>Not matched</blockquote>

When the regex above is applied, for example as in preg_replace($pattern, $replace, $content), the above block transforms into:

<h3>Te sugerimos</h3>
<p><a href="https://example.com/post-title/" target="_blank" rel="noopener noreferrer" style="outline: none;"><strong>POST TITLE</strong></a></p>

<p>Other stuff</p>
<blockquote>Not matched</blockquote>

Which I assume is your desired output.

Now, how to apply this to all your content? You have three basic options:

  1. Use MySQL's REGEXP_REPLACE function -- whether in terminal, in PHPMyAdmin, or from a PHP script. Please see How to do a regular expression replace in MySQL? for usage examples, then match to your database structure.
  2. Handle the cleanup in PHP: Run a select query for all posts with this pattern; then modify content with preg_replace; finally update the database entries.
  3. Download a database dump, open it up in your favorite text editor (with regex support), or pipe it into your tool of choice, and do the necessary replacements; finally reload into your database. (You may want to have your site in maintenance mode while this is happening!)

Whichever way you choose to do this, be sure to backup your data first.

Markus AO
  • 4,771
  • 2
  • 18
  • 29