3

I've noticed that MySql has an extensive search capacity, allowing both wildcards and regular expressions. However, I'm in somewhat in a bind since I'm trying to extract multiple values from a single string in my select query.

For example, if I had the text "<span>Test</span> this <span>query</span>", perhaps using regular expressions I could find and extract values "Test" or "query", but in my case, I have potentially n such strings to extract. And since I can't define n columns in my select statement, that means I'm stuck.

Is there anyway I could have a list of values (ideally separated by commas) of any text contained with span tags?

In other words, if I ran this query, I would get "Test,query" as the value of spanlist:

select <insert logic here> as spanlist from HtmlPages ...
Neil
  • 5,762
  • 24
  • 36
  • Thought : Return several records for each `HtmlPages` record, not several columns (and certainly not a comma delimitered list unless you REALLY have/need to) (in other words, a normalised result set). – MatBailie Jun 27 '11 at 12:00
  • I could, but I have reasons for making it a comma-separated list. The resulting query will be exported into a csv file and eventually into an excel document. Though lacking that, having multiple records, one for each span content, would be better than nothing. – Neil Jun 27 '11 at 12:05
  • Possible duplicate: http://stackoverflow.com/questions/4021507/mysql-use-regex-to-extract-string – Don Jun 27 '11 at 12:24
  • With all due respect, I don't think this is a duplicate because it is the solution to finding two words, not *n* words. – Neil Jun 27 '11 at 12:31
  • To be fair to the OP, just because using SQL for splitting a string is not *ideal*, one should not assume that using another language in another tier is ideal either. This leads to a least-worst-choice, and to make that choice the OP needs to know the SQL solution, it's costs and benefits. – MatBailie Jun 27 '11 at 13:29
  • I managed to resolve my own solution using an external editor (UltraEdit). If nobody comes up with a better solution, I'll answer my own question as the correct one. I understand that the data structure is not ideal, though I can't rewrite mediawiki so I'm stuck with the hand I'm dealt. – Neil Jun 27 '11 at 13:58

4 Answers4

1

I'm surprised no one suggested EXTRACTVALUE(xml,'xpath') I think it does exactly what you are asking for, only with a little bit of trickery to get that delimiter in there (default "delimiter" is a space).

SET @xml = '<html><span>Test</span> this <span>query</span>
    <span>etc</span><div><span>etc etc</span></div></html>';

SELECT 
    LEFT(spanlist,LENGTH(spanlist)-1) AS spanlist
FROM
    (SELECT 
        EXTRACTVALUE(REPLACE(@xml,'</span>',',</span>'),'//span') AS spanlist
    ) AS T
;

Produces

+---------------------------+
| spanlist                  |
+---------------------------+
| Test, query, etc, etc etc |
+---------------------------+

Sorry this answer may be too late for you but hope it helps the next person

KCD
  • 9,873
  • 5
  • 66
  • 75
  • +1. Albeit a bit late, that is still an excellent answer and I appreciate the effort. Unfortunately it doesn't serve me now, but I'll remember this next time something like this comes up. – Neil Jun 06 '12 at 09:14
1

First of all - your data structure is miserable. One of the most common database rules is "Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else)." It's one of the elementary rules when developing data structure.

As you see there is no possibility to write static select statement returning variable count of columns. The best approach if you do not want to change structure is to get text as is and post-process it in your favorite programming language.

piotrpo
  • 12,398
  • 7
  • 42
  • 58
  • I agree, but the question is still a good question, perhaps with different data – Don Jun 27 '11 at 12:22
  • I used html as an example, though in reality, I'm trying to extract categories from pages in a mediawiki. They're all formatted [[Category:Category name]]. They're usually together, but would be nice to be able to provide a comma-separated list to my boss rather than clump of wiki code that is *probably* accurate. – Neil Jun 27 '11 at 12:29
  • 1
    SQL is a language for manipulating relational datasets. This is not relational dataset - that's all. Best way probably will be export tags and give them proper format using some calc or excel. – piotrpo Jun 27 '11 at 12:42
1

MySQL does have a regular expression engine, but it is implemented in the WHERE clause, not as part of the query field. It therefore cannot be used to split a string.

I would suggest that you'll need to use another language for this. Query the data using an application and split it as required there, or write a small program to do it - it'd probably be just a few lines in PHP.

If you really want to do it in MySQL, you could probably write a function to do it -- see this answer for some leads: Can Mysql Split a column? -- but frankly, it's not what MySQL (or SQL in general) is designed to do. You'd be much better off querying the data normally and splitting it into bits in another language.

( I mention PHP as it's what MediaWiki is written with, so you should be able to run PHP code without installing anything new, but pretty much any language will do).

Community
  • 1
  • 1
Spudley
  • 166,037
  • 39
  • 233
  • 307
-1

Unfortunately the only solution available was to use a third party text editor such as Ultraedit and remove the tags using regular expressions. It seems it's not possible to use MySql.

Though this is not an idealic solution. If anyone ever stumbles across this question with a solution that will allow me to use MySQL to extract this information multiple lines or otherwise, I'd be more than happy to hear it.

Lacking a better solution, this will have to answer my question for now.

Neil
  • 5,762
  • 24
  • 36
  • Ultraedit is not a optimal solution and your question is tagged as MYSQL and this solution is posted as comment so please remove.My suggestion is to use while loops . – user1613360 Mar 12 '16 at 21:12