0

What should be the regex pattern if my texts contain the characters like "\ / > <" etc and I want to find them. That's because regex treats "/" like it's part of the search pattern and not an individual character.

For example, I want to find Super Kings from the string <span>Super Kings</span>, using VB 2010.

Thanks!

Wiseguy
  • 20,522
  • 8
  • 65
  • 81
AntikM
  • 636
  • 4
  • 13
  • 28
  • 1
    You should not try to use regex to parse xml or html. Find an html or xml parser to do it for you or you'll drown. – Thunder Rabbit May 07 '12 at 04:10
  • Could you please advice of an HTML parser that works in both VB 2010 Win32 and Windows Phone 7 applications? – AntikM May 07 '12 at 04:13
  • 1
    You are not likely to get a lot of sympathy. If you read the documentation on your regex matcher, it will tell how how to match characters that are normally used as part of the search pattern, unless you have a truly awful regex engine. – Ira Baxter May 07 '12 at 04:14
  • For future travelers who come across this question http://stackoverflow.com/a/1732454/194309 – Thunder Rabbit May 07 '12 at 04:16

2 Answers2

1

Just try this:

\bYour_Keyword_to_find\b

\b is used in RegEx for matching word boundary.

[EDIT]

You might be looking for this:

(?<=<span>)([^<>]+?)(?=</span>)

Explanation:

<!--
(?<=<span>)([^<>]+?)(?=</span>)

Options: case insensitive; ^ and $ match at line breaks

Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=<span>)»
   Match the characters “<span>” literally «<span>»
Match the regular expression below and capture its match into backreference number 1 «([^<>]+?)»
   Match a single character NOT present in the list “<>” «[^<>]+?»
      Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=</span>)»
   Match the characters “</span>” literally «</span>»
-->

[/EDIT]

Cylian
  • 10,970
  • 4
  • 42
  • 55
1

In regex you must escape the / with \.

For instance, try: <span>(.*)<\/span> <span>([^<]*)<\/span> or <span>(.*?)<\/span>

Read more from: http://www.regular-expressions.info/characters.html

Peter Olson
  • 816
  • 1
  • 9
  • 17
  • 2
    Make sure that quantifier is lazy, not greedy. – Wiseguy May 07 '12 at 04:21
  • What about further limiting: `([^<]*)<\/span>` This way it will match until it finds another `<` – Peter Olson May 07 '12 at 04:30
  • 1
    @PeterOlson `([^<]*)<\/span>` works, but the tags are included in the search result. Is there a way to remove the tags and only return the actual words? – AntikM May 07 '12 at 04:33
  • If interested in the lazy solution: `(.*?)<\/span>`, for more info read http://www.regular-expressions.info/repeat.html – Peter Olson May 07 '12 at 04:33
  • Depending on how VB 2010 handles RegEx, anything within () should be returned as a separate variable, as in a search/replace you would access it with `$1` – Peter Olson May 07 '12 at 04:34
  • -1, ``(.*)<\/span>`` supposed to fail for searching within ``test1 other data test2``. – Cylian May 07 '12 at 05:00
  • 1
    Fair point, and that's why the further comments express a `not <` and lazy search method. – Peter Olson May 07 '12 at 05:03