1

MySQL 5.7 database has wp_post table in which the post_content column is located. The post_content column contains HTML code (WordPress blog posts) in which there are text and different links. I need to build a SQL request so that in all posts of the post_content links of the type:

http://example.com/component/tags/tag/15-text.html

Replace with links of the type:

http://example.com/tag/text.html

That is, I need to replace all the characters that are between 'example.com/' and 'text.html', with 'tag/' taking into account the fact that the number in the link can be any one-, two- or three-digit (from 1 to 999 ).

If the wildcard '%' could be used in the REPLACE function, then the request could look something like this:

UPDATE wp_posts SET post_content = REPLACE(post_content, 'component/tags/tag/%-', 'tag/');

Unfortunately, this does not work. I have no experience with databases, but it seems to me that the desired request can be created using SUBSTRING_INDEX, but I didn't succeed in doing it myself. I would be grateful for any help.

More examples:

There are -> should be

http://example.com/component/tags/tag/15-text.html  ->  http://example.com/tag/text.html

http://example.com/component/tags/tag/1-sometext.html  ->  http://example.com/tag/sometext.html

http://example.com/component/tags/tag/2-anothertext.html  ->  http://example.com/tag/anothertext.html

and so on

http://example.com/component/tags/tag/999-moreanothertext.html  ->  http://example.com/tag/moreanothertext.html

What exactly needs to be done: to remove from these links part Component/Tags/Tag/Somenumber-, Where is Somenumber is only Placeholder (Variable),

but everything else does not need to be changed

marsinden
  • 23
  • 5
  • Does this answer your question? [How to do a regular expression replace in MySQL?](https://stackoverflow.com/questions/986826/how-to-do-a-regular-expression-replace-in-mysql). “If the wildcard '%' could be used in the REPLACE function” Is it? I don't think it works with `REPLACE`. It would be match with every text so it would remove the text between 2 links, for example. – A.L Jul 07 '23 at 18:15
  • What version of mysql are you using? You verified `post_content` is plain text and not serialized already? – user3783243 Jul 07 '23 at 18:17
  • @user3783243: MySQL version 5.7, text not serialized – marsinden Jul 07 '23 at 18:31
  • Is `tag` just placeholder text in your example, and that could be variable like `example.com/thing1/text.html` and `example.com/anotherthing/text.html`? – WOUNDEDStevenJones Jul 07 '23 at 18:32
  • @woundedstevenjones: `tag` - not a placeholder (variable). In fact, the variable in my example is only the number, namely `15` – marsinden Jul 07 '23 at 18:41
  • every URL has `-text.html` at the end? Or is that part variable too? Could you provide a few different example URLs in your question for more clarity of what you're looking for? Is `http://example.com/component/tags/tag/16-another.html` a valid URL that should rewrite to `http://example.com/tag/another.html`? – WOUNDEDStevenJones Jul 07 '23 at 18:46
  • In 5.7 this would be a large task. I would pull data out via PHP, update in PHP, and then update db. Can be done in MySQL with a lot of `substr` and `locate` calls. (also presumes that the link would only ever occur once per record in `post_content`) – user3783243 Jul 07 '23 at 18:47
  • @woundedstevenjones: probably I made the question not quite clear... I edited it – marsinden Jul 07 '23 at 19:43

3 Answers3

1

This can be done via straightforward string replacement methods - CONCAT, SUBSTRING, LOCATE, REPLACE. The first 3 queries show the step-by-step logic, while query #4 puts it all together.

These queries assume http://example.com/component/tags/tag/ is the same for all URLs. If that's not the case, you could use similar logic to detect the position of /tag/ and use SUBSTRING from there.

The final query to actually update the table (query #6; note that duplicate URLs might exist after this and can be detected ahead of time with query #5 below):

UPDATE
  test
SET
  url = CONCAT(
    'http://example.com/tag/',
    SUBSTRING(
      REPLACE(
        url,
        'http://example.com/component/tags/tag/',
        ''
      ),
      LOCATE(
        '-',
        REPLACE(
          url,
          'http://example.com/component/tags/tag/',
          ''
        )
      ) + 1
    )
  );

Schema (MySQL v5.7)

CREATE TABLE test (
  id INT,
  url TEXT
);
INSERT INTO test (id, url) VALUES (1, 'http://example.com/component/tags/tag/15-thing.html');
INSERT INTO test (id, url) VALUES (2, 'http://example.com/component/tags/tag/16-another.html');
INSERT INTO test (id, url) VALUES (3, 'http://example.com/component/tags/tag/9999-texttext.html');
INSERT INTO test (id, url) VALUES (4, 'http://example.com/component/tags/tag/9999-more-and-more-and-more-and-more-text.html');
INSERT INTO test (id, url) VALUES (5, 'http://example.com/component/tags/tag/6534562-thing.html');
INSERT INTO test (id, url) VALUES (6, 'http://example.com/component/tags/tag/0-thing.html');
INSERT INTO test (id, url) VALUES (7, 'http://example.com/component/tags/tag/1-sometext.html');
INSERT INTO test (id, url) VALUES (8, 'http://example.com/component/tags/tag/2-anothertext.html');
INSERT INTO test (id, url) VALUES (9, 'http://example.com/component/tags/tag/999-moreanothertext.html');

Query #1 - remove the known/standard text before the filename (e.g. 15-thing.html)

SELECT
  REPLACE(
    url,
    'http://example.com/component/tags/tag/',
    ''
  ) as variable_text
FROM test;
variable_text
15-thing.html
16-another.html
9999-texttext.html
9999-more-and-more-and-more-and-more-text.html
6534562-thing.html
0-thing.html
1-sometext.html
2-anothertext.html
999-moreanothertext.html

Query #2 - find the index of - in the remaining string (e.g. 3)

SELECT
  LOCATE(
    '-',
    REPLACE(
      url,
      'http://example.com/component/tags/tag/',
      ''
    )
  ) as hyphen_index
FROM test;
hyphen_index
3
3
5
5
8
2
2
2
4

Query #3 - remove everything after that index (e.g. thing.html)

SELECT
  SUBSTRING(
    REPLACE(
      url,
      'http://example.com/component/tags/tag/',
      ''
    ),
    LOCATE(
      '-',
      REPLACE(
        url,
        'http://example.com/component/tags/tag/',
        ''
      )
    ) + 1
  ) as filename
FROM test;
filename
thing.html
another.html
texttext.html
more-and-more-and-more-and-more-text.html
thing.html
thing.html
sometext.html
anothertext.html
moreanothertext.html

Query #4 - construct the full URL (e.g. http://example.com/tag/thing.html)

SELECT
  CONCAT(
    'http://example.com/tag/',
    SUBSTRING(
      REPLACE(
        url,
        'http://example.com/component/tags/tag/',
        ''
      ),
      LOCATE(
        '-',
        REPLACE(
          url,
          'http://example.com/component/tags/tag/',
          ''
        )
      ) + 1
    )
  ) as new_url
FROM test;

Query #5 - detect duplicates ahead of actually updating the table

SELECT GROUP_CONCAT(id) as duplicate_ids, new_url FROM (
  SELECT
    id,
    CONCAT(
      'http://example.com/tag/',
      SUBSTRING(
        REPLACE(
          url,
          'http://example.com/component/tags/tag/',
          ''
        ),
        LOCATE(
          '-',
          REPLACE(
            url,
            'http://example.com/component/tags/tag/',
            ''
          )
        ) + 1
      )
    ) as new_url
  FROM test) as new_urls
GROUP BY new_url
HAVING COUNT(*) > 1;
duplicate_ids new_url
1,5,6 http://example.com/tag/thing.html

Query #6 - actually update the table

UPDATE
  test
SET
  url = CONCAT(
    'http://example.com/tag/',
    SUBSTRING(
      REPLACE(
        url,
        'http://example.com/component/tags/tag/',
        ''
      ),
      LOCATE(
        '-',
        REPLACE(
          url,
          'http://example.com/component/tags/tag/',
          ''
        )
      ) + 1
    )
  );

There are no results to be displayed.


Query #7 - view the new URLs

SELECT * FROM test;

View on DB Fiddle

WOUNDEDStevenJones
  • 5,150
  • 6
  • 41
  • 53
  • probably I made the question not quite clear sorry, it's my first question here. The column has many different links like `http://example.com/component/tags/tag/somenumber-sometext.html` What exactly needs to be done: to remove from these links part `Component/Tags/Tag/Somenumber-`, Where is `Somenumber` is Placeholder (Variable), but everything else does not need to be touched. I edited the question, added more examples. Thanks for the answer. – marsinden Jul 07 '23 at 19:58
  • Using specific examples from my posted answer, which part of this is not working correctly? `http://example.com/component/tags/tag/15-thing.html` becomes `http://example.com/tag/thing.html`, and `http://example.com/component/tags/tag/9999-texttext.html` becomes `http://example.com/tag/texttext.html`. Edit: I added the examples from your question, and this seems to work exactly as described. If that's not the case, please clarify what isn't working. – WOUNDEDStevenJones Jul 07 '23 at 20:01
  • Your code works, but not in my case. Probably because the post_content column contains HTML (WordPress blog posts) in which there are text and different links, and not just links. It's my fault. I'm really very sorry. Had to edit the question again. – marsinden Jul 07 '23 at 22:14
  • 1
    Please include specific examples of what the database contains. Each edit seems to be adding complexity which might be easiesr in a PHP script instead of a MySQL query, but without the full details and specific examples we can't suggest specific solutions. – WOUNDEDStevenJones Jul 08 '23 at 02:57
1
UPDATE wp_posts
SET post_content = CONCAT(
    SUBSTRING_INDEX(post_content, 'example.com/', 1), -- Extract the part before 'example.com/'
    'example.com/tag/',                             -- Append the new base URL
    SUBSTRING_INDEX(SUBSTRING_INDEX(post_content, 'example.com/', -1), '-text.html', 1), -- Extract the number between 'example.com/' and '-text.html'
    '-text.html',                                   -- Append the constant '-text.html'
    SUBSTRING_INDEX(post_content, '-text.html', -1) -- Extract the part after '-text.html'
)
WHERE post_content LIKE '%example.com/component/tags/tag/%-text.html%';

With the examples given:

UPDATE wp_posts
SET post_content = CONCAT(
    SUBSTRING_INDEX(post_content, 'example.com/', 1), -- Extract the part before 'example.com/'
    'example.com/tag/',                             -- Append the new base URL
    REGEXP_REPLACE(
        REGEXP_SUBSTR(post_content, 'example.com/component/tags/tag/[0-9]{1,3}-([^"]+).html'),
        'example.com/component/tags/tag/[0-9]{1,3}-',
        ''
    )
)
WHERE post_content REGEXP 'example.com/component/tags/tag/[0-9]{1,3}-([^"]+).html';
Matt
  • 14,906
  • 27
  • 99
  • 149
  • Their question wasn't clear initially, but `-text` is also variable. The filename could be `15-text.html` or `999-another-one.html`. – WOUNDEDStevenJones Jul 07 '23 at 20:09
  • @matt: Sorry, I was too inattentive. Had to edit the question several times. REGEXP_REPLACE function was released in version 8.0.4. Thanks a lot – marsinden Jul 07 '23 at 22:24
0

You need not just a replace, but a replace with regexp. Look at the request below.

SELECT REGEXP_REPLACE('http://example.com/component/tags/tag/15-text.html', 'http://([^/]+)[^-]+-(text\.html)', 'http://$1/tags/$2');
e1st0rm
  • 95
  • 2