0

I have another question for regular expression in F#:

let tagName = "div"
let ptnTagNotClose = "<" + tagName + "(>|\s+[^>]*>)[^<]"

I want to find the matches for not closing tag in HTML file. The pattern string works in VB.NET. But for F#, when I debug the above code, I can see the value for ptnTagNotClose:

ptnTagNotClose "<div(>|\\s+[^>]*>)[^<]"

F# automatically change "\s+" to "\\s+", but for regular expression, "\s+" and "\\s+" are different, the results are also different. Please let me know what to do to avoid F# automatically change the string pattern.

Verbatim string literal could be one solution, but since the tagName can change, i.e. let tagName = "br", then how I can apply verbatim string literal in this case?

Thanks!

John

John John
  • 377
  • 1
  • 11
  • 2
    At this point I feel compelled to point out that it is generally considered [a very bad idea](http://stackoverflow.com/a/1732454/24380) to parse HTML or XML with regular expressions. – Joel Mueller Dec 18 '11 at 18:16
  • What does the fact that `tagName` can change have to do with verbatim string literals? Those are completely orthogonal issues... – ildjarn Dec 19 '11 at 19:02
  • Hi, as far as I know, using regular expression, people can extract some useful information from HTML. For well-formatted XML, no need to use regular expression, as there are enough tools to get the useful information. I know it is not easy to get all the information from HTML by regular expression. But as long as I can get the information necessary for my other part of program running, I think it is enough! – John John Dec 19 '11 at 21:17

1 Answers1

2

I don't think that the debug output means what you think it does; using a verbatim string (like "<" + tagName + @"(>|\s+[^>]*>)[^<]") will give you the exact same result because \s isn't a valid escape sequence, so F# interprets the backslash as a literal backslash rather than an escape character.

kvb
  • 54,864
  • 2
  • 91
  • 133
  • Hi, KVB: I found the issue: inside VS2010 IDE, the debug result for the pattern was wrong, but from F# interactive, the result for the pattern was correct. But it seems there are many difference between string/regular expression for VB.NET and for F#; therefore, the correct in VB.NET can not be reproduced in F#. – John John Dec 19 '11 at 21:13
  • Hi, I still believe that Regex in F# works differently than Regex in VB.NET. If some one can try to use Regex in F# to extract some table elements in HTML file, you will understand it. But using Regex in VB.NET usually works! – John John Jan 29 '12 at 10:58