2

I have the following query:

SELECT item from table

Which gives me:

<title>Titanic</title>

How would I extract the name "Titanic" from this? Something like:

SELECT re.find('\>(.+)\>, item) FROM table

What would be the correct syntax for this?

Downgoat
  • 13,771
  • 5
  • 46
  • 69
David542
  • 104,438
  • 178
  • 489
  • 842
  • Don't store "markup" with the title, store only `Titanic`. When fetching the `title` to build a web page, _then_ add the tags around it. – Rick James Jul 27 '15 at 23:35

4 Answers4

2

By default, MySQL does not provide functionality for extracting text using regular expressions. You can use REGEXP to find rows that match something like >.+<, but there is no straightforward way of extracting the captured group without some additional effort, such as:

  • using a library like lib_mysqludf_preg
  • writing your own MySQL function to extract matched text
  • performing regular string manipulation
  • using the regex functionality of whatever environment you're using MySQL from (e.g. PHP's preg_match)
  • reconsidering your need for regular expressions entirely. If you know that all your rows contain a <title> tag, for instance, it may be a better idea to simply use "normal" string functions such as SUBSTRING
George Bahij
  • 597
  • 2
  • 9
1

As pointed out in the informative answer by George Bahij MySQL lacks this functionality so the options would be to either extend the functionality using udfs etc, or use the available string functions, in which case you could do:

SELECT 
  SUBSTR(
    SUBSTRING_INDEX(
      SUBSTRING_INDEX(item,'<title>',2)
      ,'</title>',1) 
    FROM 8
  )
from table

Or if the string you need to extract from always is on the format <title>item</title> then you could simple use replace: replace(replace(item, '<title>', ''), '</title>','')

Community
  • 1
  • 1
jpw
  • 44,361
  • 6
  • 66
  • 86
  • What does "`FROM 8`" do? – David542 Jul 18 '15 at 00:19
  • @David542 It skips the first 8 characters of the string, in this case ``. I used it because the substring_index returns the string starting from the search term. I'm not that familiar with the string functions in MySQL so there might be better ways to do it. – jpw Jul 18 '15 at 00:22
0

This regex: <\w+>.+</\w+> will match content in tags.

Your query should be something like:

SELECT * FROM `table` WHERE `field` REGEXP '<\w+>.+</\w+>';

Then if you're using PHP or something similar you could use a function like strip_tags to extract the content between the tags.

Daniel Waghorn
  • 2,997
  • 2
  • 20
  • 33
  • I understand that. Every single row will have that tag and I want to extract the title for each row. – David542 Jul 18 '15 at 00:01
  • OK, if the tags are present in every entry you don't need to restrict the query with the regex. Still MySQL doesn't natively contain functionality to capture regex matches. Your best bet is processing the output or using the substring method described by @GeorgeBahij – Daniel Waghorn Jul 18 '15 at 00:05
0

XML shouldn't be parsed with regexes, and at any rate MySQL only supports matching, not replacement.

But MySQL supports XPath 1.0. You should be able to simply do this:

SELECT ExtractValue(item,'/title') AS item_title FROM table;

https://dev.mysql.com/doc/refman/5.6/en/xml-functions.html

Michael - sqlbot
  • 169,571
  • 25
  • 353
  • 427