Finding whats between two entered strings using regex?

Question

I am working on a simple facebook messenger client (without the need of a developer account) and so far what i have achieved is getting all my messages - name, preview, time. What i'd like to find is the users href link

so far i have this:

            MatchCollection name = Regex.Matches(
            htmlText, "<div class=\"_l2\">(.*?)</div>");

        MatchCollection preview = Regex.Matches(
            htmlText, "<div class=\"_l3 fsm fwn fcg\">(.*?)</div>");

        MatchCollection time = Regex.Matches(
            htmlText, "<div class=\"_l4\">(.*?)</div>");

which fully works.

but i've tried a few things that i found on this website but nothing seemed to work. The href goes like: <a class="_k_ hoverZoomLink" rel="ignore" href="

and ends with a ". Could someone refer me to an article that actually might help me know how i can get that href. Or even a better way of doing it other than regex but i would really prefer regex:

for (int i = 0; i < name.Count; i++)
        {
            String resultName = Regex.Replace(name[i].Value, @"<[^>]*>", String.Empty);
            String newName = resultName.Substring(0, resultName.Length - 5);
            String resultPreview = Regex.Replace(preview[i].Value, @"<[^>]*>", String.Empty);
            String s = time[i].Value;
            int start = s.IndexOf("data-utime=\"") + 28;
            int end = s.IndexOf("</abbr>", start);
            String newTime = s.Substring(start, (end - start));
            threads.Add(new Thread(newName, resultPreview, newTime, ""));
        }

Thanks in advanced.

I would look into using the HTML Agility Pack and XPath, not regular expression. — hwnd, Jun 26 '15 at 19:38
[RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) — Steve, Jun 26 '15 at 19:38
Required reading on the subject of parsing markup languages with regex: [link](http://stackoverflow.com/a/1732454/335858) — Sergey Kalinichenko, Jun 26 '15 at 19:39
Thanks for the feed back, i've just started to look at html agility pack — Zak, Jun 26 '15 at 19:48

score 0 · Answer 1 · answered Jun 26 '15 at 19:44

0

Use a real html parser like HtmlAgilityPack

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlstring);
var link = doc.DocumentNode.SelectSingleNode("//a[@class='_k_ hoverZoomLink']")
              .Attributes["href"].Value;

Instead of XPath, you can use Linq too

var link = doc.DocumentNode.Descendants("a")
               .Where(a => a.Attributes["class"] != null)
               .First(a => a.Attributes["class"].Value == "_k_ hoverZoomLink")
               .Attributes["href"].Value;

answered Jun 26 '15 at 19:44

EZI

15,209
2
27
33

This gives me an exception, A first chance exception of type 'System.NullReferenceException' occurred in Facebook Messenger.exe A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll – Zak Jun 26 '15 at 20:23
@Zak See the class name `_k_ hoverZoomLink`. Is there a blank? Should it be `_k_hoverZoomLink`? All you have to do is using correct value for attributes – EZI Jun 26 '15 at 20:26
http://i.imgur.com/ASbOgKb.png the regex parsing with the usernames ect work completely fine double clicking the a class=" produces this:"_k_ hoverZoomLink" – Zak Jun 26 '15 at 20:42
@Zak What do you want to do? Parsing an html in a correct, simple and maintainable way or convince me to use regex? – EZI Jun 26 '15 at 20:49

Finding whats between two entered strings using regex?

1 Answers1