1

I need to extarct all links from a database so I can create a URL rewrite. Using a SQL statement, how can I extract just links from a table? Sometimes there is more than one link within a cell which further complicates it. Any idea on how to achieve this?

EDIT An example of this would be: SELECT myval FROM htmlcontrols. "myval" has an HTML string such as "<div>Hi this is a test. <a href="somewhere.htm">Click here</a> or <a href="http://somewhereelse.com/testarea">here</a></div>". I want an extract like this:

LINKS
-----
somewhere.htm
http://somewhereelse.com/testarea
TruMan1
  • 33,665
  • 59
  • 184
  • 335
  • 3
    You'll need to show an example of the query so far, and what your data might look like. There's lots of (bad) ways to store multi-valued attributes. – Yuck Oct 21 '11 at 14:49
  • From the Question, I'd say that there are text fields with html in them. And the OP needs to strip out each and every link from those html strings. – MatBailie Oct 21 '11 at 14:51
  • @Dems Right, but delimiters are important for one thing. Is it just text with links in the mix, as you said? Or a field that may have 1+ links separated by `| ; \t \r\n`...? – Yuck Oct 21 '11 at 14:54
  • Thanks I modified my question with the extra details. There is no multi-valued attributes.. just a plain HTML string column. – TruMan1 Oct 21 '11 at 15:16
  • This would be easier with CLR and Regex (or CLR and HTML parser dependant on degree of accuracy required) – Martin Smith Oct 21 '11 at 15:19

1 Answers1

-1

You are probably better off either handling this on the client side, iterating through the fields and parsing the HTML to then re-insert them on whatever table/columns you need to; or at least, creating a UDF that can do the parsing efficiently.

Note that the link I posted above is an implementation of a UDF RegEx function but I am not suggesting necessarily that you should use a RegEx to parse HTML as this is almost always a bad idea.

If you go with the CLR function, take a look at HTMLAgilityPack

Community
  • 1
  • 1
Icarus
  • 63,293
  • 14
  • 100
  • 115