Read specific text from page into string array in C#

Question

I've tried this and searched for help but I cannot figure it out. I can get the source for a page but I don't need the whole thing, just one string that is repeated. Think of it like trying to grab only the titles of articles on a page and adding them in order to an array without losing any special characters. Can someone shed some light?

Either use an html parser or a regular expression to find the text of interest. — Klaus Byskov Pedersen, Nov 17 '11 at 11:45

score 0 · Answer 1 · answered Nov 17 '11 at 11:45

0

You can use a Regular Expression

to extract the content you want from a string, such as your html string.

Or you can use a DOM parser such as

Html Agility Pack

Hope this helps!

answered Nov 17 '11 at 11:45

dknaack

60,192
27
155
202

Oblig: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Oded Nov 17 '11 at 11:51

ipr101 · Answer 2 · 2011-11-17T12:09:04.477

0

You could use something like this -

var text = "12 hello 45 yes 890 bye 999";
var matches = System.Text.RegularExpressions.Regex.Matches(text,@"\d+").Cast<Match>().Select(m => m.Value).ToList();

The example pulls all numbers in the text variable into a list of strings. But you could change the Regular Expression to do something more suited to your needs.

edited Nov 17 '11 at 12:09

answered Nov 17 '11 at 11:56

ipr101

24,096
8
59
61

score 0 · Answer 3 · answered Nov 17 '11 at 12:17

if the page is well-formed xml, you could use linq to xml by loading the page into an XDocument and using XPath or another way of traversing to the element(s) you desire and loading what you need into the array for which you are looking (or just use the enumerable if all you want to do is enumerate). if the page is not under your control, though, this is a brittle solution that could break at any time when subtle changes could break the well-formedness of the xml. if that's the case, you're probably better off using regular expressions. eiither way, though, the page could be changed under you and your code suddenly won't work anymore.

the best thing you could do would be to get the provider of the page to expose what you need as a webservice rather than trying to scrape their page.

Read specific text from page into string array in C#

3 Answers3