4

We have a site using wordpress and we have discovered that at some point, a bad plugin or user error has added double slashes after the siteurl (for example, http://example.site//category1/ or http://example.site/category1//category2/, etc.

This seems to work but it looks like there aren't quite enough results.

SELECT id, post_content
FROM `wp_posts`
where post_content
regexp '(href="[^"]*[^:]\/\/[^"]*)'
and post_status  in('draft','publish')
order by id asc

Is there a better way to do this? I don't want it to match on the double slash that comes after the http:, hence the negative match on the :.

Edit: for clarification, I want to find all posts (the body of a wordpress post/page) that have a url hard-coded into the page that has double slashes, but do not match on the double slashes after the http:.

Regexp should match on the following: http://example.site//category1/ or http://example.site/category1//category2/ or even http://example.site/category1/category2// or example.site/category1//category2/

But should not match on the following: http://example.site/category1/ or http://example.site/category1/category2/

Giacomo1968
  • 25,759
  • 11
  • 71
  • 103
amatusko
  • 348
  • 2
  • 14

2 Answers2

3

Perhaps something like this would work.

SELECT *
FROM wp_posts
WHERE CASE WHEN instr(post_content,'http://') > 0 THEN 
  substring(post_content,7) regexp '\/\/'
ELSE
  post_content regexp '\/\/'
END

And here is the SQL Fiddle.

Good luck.

sgeddes
  • 62,311
  • 6
  • 61
  • 83
  • +1 for using [`INSTR`](http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_instr) – xkeshav Jan 29 '13 at 20:05
0

You could use:

regexp '(https?:\/\/|www\.)[^ ]*\/\/'

This will match a post if it contains http[s]:// or www. followed by non-space characters which have // among them.

See this SQLFiddle (adapted from the fiddle of sgeddes).

Or you could just reduce your regex to '[^:]\/\/' and look for posts containing that.

MikeM
  • 13,156
  • 2
  • 34
  • 47