-5

I need to write a regular expression that scan the html code (the string) of an article in Wikipedia for links to other articles in Wikipedia.

The links usually look like these for example:

<a href="/wiki/English Language" title="English Language">English</a>

<a href="/wiki/Spanish Language" title="Spanish Language">Spanish</a>

I tried the regular expression: "<a.*href=(\"|')(.+?)(\"|')*wiki.*>" it works, but it also matches links to images and not just articles.

Michelle
  • 1
  • 2

1 Answers1

0

I finally succeed. I wrote a regular expression for the beginning of the link:

(@"<a href=""/wiki/[A-Z][A-Za-z0-9\-\-_]+""")
Michelle
  • 1
  • 2