0

I have a mysql database with 400k posts. In arround 100k posts are images. Example:

text text text <img src="/images/test.jpg"> text text text

The complete image string <img src="/images/test.jpg"> must be deleted. But its not always <img src/images/test.jpg"> there are many variants. So I could not use the typical sql replace .

What I need is "sql replace between two characters". For me, delete / replace everything between "<img" and ">" and off course incloude the <img and >

Is that anyways possible? Thank you

Johan
  • 74,508
  • 24
  • 191
  • 319
labu77
  • 605
  • 1
  • 9
  • 30
  • [What have you tried?](http://www.whathaveyoutried.com/) See [ask advice](http://stackoverflow.com/questions/ask-advice), please. – John Conde Feb 09 '13 at 22:28
  • The database part is not the important part. What you're really talking about is parsing HTML. Look at http://htmlparsing.com/ for ideas. You're not going to be able to do this only with SQL. – Andy Lester Feb 09 '13 at 22:30
  • thank you so much for your quick reply. 100k entries are a lot. I need to remove this images complete out of the database. I know I could replace them with php before they are displaying but for me it would be better to delete the images direct in the database. I tried it with something like this stackoverflow.com/questions/8850798/sql-replace-all-characters-between-two-strings but without success. Thank you for help. – labu77 Feb 09 '13 at 22:39
  • Added the regex tag to broaden the audience. I believe the answer lies there. – Dan Bracuk Feb 09 '13 at 22:44
  • I don't understand your question. Are those pictures in the database? Do you just wanna remove the tag from HTML or you want to remove the pictures too? What do you need to do with the database? – SwiftMango Feb 09 '13 at 22:49
  • do you have a example? I am not good with regex and i am a dummy with sql :/ – labu77 Feb 09 '13 at 22:49
  • Hi Texas bruce, the image path is in the database and i want remove the complete image path from the database. thank you in advance – labu77 Feb 09 '13 at 22:51
  • @DanBracuk, as [has been definitively answered](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454), this is most definitely ***not*** a regex problem. – Jim Garrison Feb 09 '13 at 22:58
  • Please show us your table structure (are `text`s separate fields?), some examples of the existing tag variations, and the final result you expect in the database. Is the `` tag embedded inside XML or HTML? Also, what's the purpose of an empty `` tag if it's _always_ going to be empty? Why not just remove it completely? – Branko Dimitrijevic Feb 09 '13 at 23:02
  • Hi Branko, thank you. Here is thestructure: DB: phpbb Table: phpbb_posts Field: post_text It´s a old database from 2002. From 2002 - 2007 the phpbb forum had html allowed. Many people placed images over html in the forum. Like I said arround 100 000. I tested it with %Smile – labu77 Feb 09 '13 at 23:15
  • NO is not embedded anymore because html is disabled. For me all html that begins with could be deleted. Yes, I want remove it completly out of the database. Again: every path that looks like could be deleted. – labu77 Feb 09 '13 at 23:18
  • possible duplicate of [MySQL Regex Replace](http://stackoverflow.com/questions/1755408/mysql-regex-replace) – Johan Feb 10 '13 at 10:07

1 Answers1

0

MySQL supports regex:

UPDATE Mytable SET url = '' WHERE url RLIKE someregex;

See: http://dev.mysql.com/doc/refman/5.1/en/regexp.html

Something like:

UPDATE Mytable SET url = '' WHERE url RLIKE '\\<img.\\>'

But note that MySQL does not support output of regex mangled strings. It will only compare.

However, there is a MySQL user function REGEX_REPLACE, see: https://launchpad.net/mysql-udf-regexp
See also this question: How to count words in MySQL / regular expression replacer?

Community
  • 1
  • 1
Johan
  • 74,508
  • 24
  • 191
  • 319