0

I am working on a simple facebook messenger client (without the need of a developer account) and so far what i have achieved is getting all my messages - name, preview, time. What i'd like to find is the users href link

so far i have this:

            MatchCollection name = Regex.Matches(
            htmlText, "<div class=\"_l2\">(.*?)</div>");

        MatchCollection preview = Regex.Matches(
            htmlText, "<div class=\"_l3 fsm fwn fcg\">(.*?)</div>");

        MatchCollection time = Regex.Matches(
            htmlText, "<div class=\"_l4\">(.*?)</div>");

which fully works.

but i've tried a few things that i found on this website but nothing seemed to work. The href goes like: <a class="_k_ hoverZoomLink" rel="ignore" href="

and ends with a ". Could someone refer me to an article that actually might help me know how i can get that href. Or even a better way of doing it other than regex but i would really prefer regex:

for (int i = 0; i < name.Count; i++)
        {
            String resultName = Regex.Replace(name[i].Value, @"<[^>]*>", String.Empty);
            String newName = resultName.Substring(0, resultName.Length - 5);
            String resultPreview = Regex.Replace(preview[i].Value, @"<[^>]*>", String.Empty);
            String s = time[i].Value;
            int start = s.IndexOf("data-utime=\"") + 28;
            int end = s.IndexOf("</abbr>", start);
            String newTime = s.Substring(start, (end - start));
            threads.Add(new Thread(newName, resultPreview, newTime, ""));
        }

Thanks in advanced.

EZI
  • 15,209
  • 2
  • 27
  • 33
Zak
  • 21
  • 4
  • 3
    I would look into using the HTML Agility Pack and XPath, not regular expression. – hwnd Jun 26 '15 at 19:38
  • 1
    [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Steve Jun 26 '15 at 19:38
  • 1
    Required reading on the subject of parsing markup languages with regex: [link](http://stackoverflow.com/a/1732454/335858) – Sergey Kalinichenko Jun 26 '15 at 19:39
  • Thanks for the feed back, i've just started to look at html agility pack – Zak Jun 26 '15 at 19:48

1 Answers1

0

Use a real html parser like HtmlAgilityPack

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlstring);
var link = doc.DocumentNode.SelectSingleNode("//a[@class='_k_ hoverZoomLink']")
              .Attributes["href"].Value;

Instead of XPath, you can use Linq too

var link = doc.DocumentNode.Descendants("a")
               .Where(a => a.Attributes["class"] != null)
               .First(a => a.Attributes["class"].Value == "_k_ hoverZoomLink")
               .Attributes["href"].Value;
EZI
  • 15,209
  • 2
  • 27
  • 33
  • This gives me an exception, A first chance exception of type 'System.NullReferenceException' occurred in Facebook Messenger.exe A first chance exception of type 'System.Reflection.TargetInvocationException' occurred in mscorlib.dll – Zak Jun 26 '15 at 20:23
  • @Zak See the class name `_k_ hoverZoomLink`. Is there a blank? Should it be `_k_hoverZoomLink`? All you have to do is using correct value for attributes – EZI Jun 26 '15 at 20:26
  • http://i.imgur.com/ASbOgKb.png the regex parsing with the usernames ect work completely fine double clicking the a class=" produces this:"_k_ hoverZoomLink" – Zak Jun 26 '15 at 20:42
  • @Zak What do you want to do? Parsing an html in a correct, simple and maintainable way or convince me to use regex? – EZI Jun 26 '15 at 20:49