3

I am moving a site that I manage from a custom CMS to Wordpress and am finding that some of the attributes on image tags to be problematic on display once in the WP environment. To fix this, I need to strip out the height attribute that is inlined into each image tag in the post_content column of the wp_posts table.

Starting with the original value in the DB, I want the following:

<img src="http://example.com/img/20150823_image.jpg" style="width: 730px; height: 730px;" />

To become:

<img src="http://example.com/img/20150823_image.jpg" style="width: 730px;" />

So, essentially, I need to trim out the " height: 730px;" portion. It is image-specific, so in this case it is 730 but in another it could be 1500, 447, 80, etc.

I was trying to see if I could use a '%' as a wildcard but that doesn't seem to be working...

UPDATE wp_posts SET post_content = REPLACE(post_content,' height: %px;','');

Any help would be greatly appreciated as I'd rather not have to manually go through thousands of rows stripping these out.

miken32
  • 42,008
  • 16
  • 111
  • 154
SSeybold
  • 39
  • 5
  • 2
    You wold need to use a regex to make the substitution. Unfortunately, SQL doesn't offer that in a replace function. Your best bet would be to write a php script to step through the rows and do the replacement. – Sloan Thrasher Apr 12 '17 at 16:56
  • Here's some documentation on LIKE, perhaps it will help. https://learn.microsoft.com/en-us/sql/t-sql/language-elements/like-transact-sql – Adam Starrh Apr 12 '17 at 16:58

3 Answers3

2

You can use a function to do the text parsing:

create function f_strip_height( in_s text ) returns text
begin

declare v_start int;
declare v_end int;
declare v_height text;

select locate(' height:', in_s ) into v_start;
if (v_start>0) then

  select locate( 'px;', substring( in_s, v_start)  ) into v_end;

  select trim(substring(substring( in_s, v_start, v_end+2), 9)) into v_height;

  if (v_end>0 and concat(cast(v_height as unsigned), 'px;' =  v_height)) then
    return concat(substring( in_s, 1, v_start-1), substring( in_s, v_start+v_end+2));
  end if;
end if;

return in_s;
end

Then use the function:

UPDATE wp_posts SET post_content = f_strip_height(post_content);
slaakso
  • 8,331
  • 2
  • 16
  • 27
  • This will work, as long as the format is exactly the same (`style` is the last attribute of the element, `height` is the last rule inside that attribute, double quotes are always used, the element is self-closing, etc, etc.) – miken32 Apr 12 '17 at 18:03
  • One can easily improve the function to be more generic. Modified the example code. – slaakso Apr 12 '17 at 18:36
  • That wasn't really my point, because I could keep playing this game: what about `height: 0;`, what if it's the last rule and has no semicolon, what about other elements that have height set? What I'm trying to say is that parsing markup with nothing more than substring indexes is going to be very fragile, even moreso than [parsing markup with regular expressions](http://stackoverflow.com/q/1732348/1255289)! – miken32 Apr 12 '17 at 18:42
  • (For the record, I'm well aware that this is nit-picking; I still gave you an upvote for the effort!) – miken32 Apr 12 '17 at 18:52
  • The use case was quite staightforward, purpose was not to create a full CSS parser. One can just see the CSS height combinations existing in the data and prepare the function for that. After the update is run, SSeybold can transfer the data to Wordpress. – slaakso Apr 12 '17 at 19:11
1

This is not a job for SQL. Here's a simple (?) PHP script that should do the trick, though I'm doing this off the top of my head so no guarantees:

<?php
// create the DB connection
$db = new PDO("mysql:host=localhost;dbname=wordpress", "user", "password");
// quiet warnings
libxml_use_internal_errors(true);
// prepare the update statement for later
$stmt = $db->prepare("UPDATE wp_posts SET post_content = ? WHERE post_id = ?");
// select the posts that at least have the word "height:" in them
$posts = $db->query("SELECT post_id, post_content FROM wp_posts WHERE post_content LIKE '%height:%'");
// loop through the posts
while ($post = $posts->fetch(PDO::FETCH_ASSOC)) {
    // create a DOM document
    $dom = new DomDocument();
    // load the HTML into the DOM parser
    $dom->loadHTML($post["post_content"], LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
    // prepare the XPath
    $xpath = new DomXPath($dom);
    // get all img elements with a style attribute containing the word height
    $imgs = $xpath->query("//img[contains(@style, 'height')]");
    foreach ($imgs as $img) {
        // get the style attribute value
        $style = $img->getAttribute("style");
        // remove height
        $style = preg_replace("/height\s*:\s*\d+(px)?;?/", "", $style);
        // replace the attribute value
        $img->setAttribute("style", $style);
    }
    // output the new HTML
    $newhtml = $dom->saveHTML();
    echo "Updating post $post["post_id"] with new content:\n$newhtml\n\n";
    // save it into the database -- uncomment this line when you trust the script!
//    $stmt->execute([$newhtml, $post["post_id"]]);
}
miken32
  • 42,008
  • 16
  • 111
  • 154
1

If you have the appropriate permission, you can use a UDF 27.4.2 Adding a New User-Defined Function, some can be:

In another case, as already mentioned, you can do your own function, here a version that you can modify and adjust as needed:

mysql> DROP TABLE IF EXISTS `wp_posts`;
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE IF NOT EXISTS `wp_posts` (
    ->     `post_content` TEXT
    -> );
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO `wp_posts`
    ->     (`post_content`)
    -> VALUES
    ->     ('<img src="http://example.com/img/20150823_image.jpg" style="width: 730px; height: 730px;" />'),
    ->     ('<img src="http://example.com/img/20150824_image.jpg" style="width: 730px; height: 1500px;" />'),
    ->     ('<img src="http://example.com/img/20150825_image.jpg" style="width: 730px; height: 80px;" />'),
    ->     ('<img src="http://example.com/img/20150826_image.jpg" style="width: 730px; height: 0px;" />'),
    ->     ('<img src="http://example.com/img/20150827_image.jpg" style="width: 730px;" />');
Query OK, 5 rows affected (0.01 sec)
Records: 5  Duplicates: 0  Warnings: 0

mysql> DELIMITER //

mysql> DROP FUNCTION IF EXISTS `get_string`//
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE FUNCTION `get_string`(`_string` TEXT,
    ->                              `_begin` VARCHAR(255),
    ->                              `_end` VARCHAR(255))
    ->     RETURNS TEXT DETERMINISTIC
    -> BEGIN
    ->     DECLARE `_begin_pos` INT UNSIGNED DEFAULT LOCATE(`_begin`, `_string`);
    ->     DECLARE `_end_pos` INT UNSIGNED DEFAULT 0;
    ->     IF `_begin_pos` IS NOT NULL AND `_begin_pos` > 0 THEN
    ->         SET `_end_pos` := LOCATE(`_end`, `_string`, `_begin_pos`);
    ->         IF `_end_pos` IS NOT NULL AND `_end_pos` > 0 THEN
    ->             RETURN SUBSTRING(`_string`,
    ->                              `_begin_pos`,
    ->                              (`_end_pos` + CHAR_LENGTH(`_end`)) - `_begin_pos`); 
    ->         END IF;
    ->     END IF;
    ->     RETURN '';
    -> END//
Query OK, 0 rows affected (0.00 sec)

mysql> DELIMITER ;

mysql> SELECT `post_content`
    -> FROM `wp_posts`;
+-----------------------------------------------------------------------------------------------+
| post_content                                                                                  |
+-----------------------------------------------------------------------------------------------+
| <img src="http://example.com/img/20150823_image.jpg" style="width: 730px; height: 730px;" />  |
| <img src="http://example.com/img/20150824_image.jpg" style="width: 730px; height: 1500px;" /> |
| <img src="http://example.com/img/20150825_image.jpg" style="width: 730px; height: 80px;" />   |
| <img src="http://example.com/img/20150826_image.jpg" style="width: 730px; height: 0px;" />    |
| <img src="http://example.com/img/20150827_image.jpg" style="width: 730px;" />                 |
+-----------------------------------------------------------------------------------------------+
5 rows in set (0.00 sec)

mysql> UPDATE `wp_posts`
    -> SET `post_content` = REPLACE(`post_content`, `get_string`(`post_content`, ' height:', ';'), '');
Query OK, 4 rows affected (0.01 sec)
Rows matched: 5  Changed: 4  Warnings: 0

mysql> SELECT `post_content`
    -> FROM `wp_posts`;
+-------------------------------------------------------------------------------+
| post_content                                                                  |
+-------------------------------------------------------------------------------+
| <img src="http://example.com/img/20150823_image.jpg" style="width: 730px;" /> |
| <img src="http://example.com/img/20150824_image.jpg" style="width: 730px;" /> |
| <img src="http://example.com/img/20150825_image.jpg" style="width: 730px;" /> |
| <img src="http://example.com/img/20150826_image.jpg" style="width: 730px;" /> |
| <img src="http://example.com/img/20150827_image.jpg" style="width: 730px;" /> |
+-------------------------------------------------------------------------------+
5 rows in set (0.00 sec)

Example in Rextester.

wchiquito
  • 16,177
  • 2
  • 34
  • 45