1

Please help me to replace all the additional Facebook information from here using C# .net Regex Replace method.

Example

<a href="/l.php?u=http%3A%2F%2Fon.fb.me%2FOE6gnB&amp;h=yAQFjL0pt&amp;s=1" target="_blank" rel="nofollow nofollow" onmouseover="LinkshimAsyncLink.swap(this, &quot;http:\/\/on.fb.me\/OE6gnB&quot;);" onclick="LinkshimAsyncLink.swap(this, &quot;\/l.php?u=http\u00253A\u00252F\u00252Fon.fb.me\u00252FOE6gnB&amp;h=yAQFjL0pt&amp;s=1&quot;);">http://on.fb.me/OE6gnB</a>somehtml

Output

somehtml <a href="http://on.fb.me/OE6gnB">on.fb.me/OE6gnB</a> somehtml

I tried following regex but they didn't work for me

searchPattern = "<a([.]*)?/l.php([.]*)?(\">)?([.]*)?(</a>)?";
replacePattern = "<a href=\"$3\" target=\"_blank\">$3</a>";

Thanks

vikas kumar
  • 2,444
  • 15
  • 25
  • http://stackoverflow.com/questions/701166/can-you-provide-some-examples-of-why-it-is-hard-to-parse-xml-and-html-with-a-reg there are several different libs that can strip data from html/xml one of them being http://htmlagilitypack.codeplex.com/, why try to reinvent the wheel? – Thomas Lindvall Aug 14 '12 at 08:08

2 Answers2

2

I manage to do this using regex with following code

 searchPattern = "<a(.*?)href=\"/l.php...(.*?)&amp;?(.*?)>(.*?)</a>";
          string html1 = Regex.Replace(html, searchPattern, delegate(Match oMatch)
    {
        return string.Format("<a href=\"{0}\" target=\"_blank\">{1}</a>", HttpUtility.UrlDecode(oMatch.Groups[2].Value), oMatch.Groups[4].Value);

    });
vikas kumar
  • 2,444
  • 15
  • 25
1

You can try this (System.Web has to be added to use System.Web.HttpUtility):

        string input = @"<a href=""/l.php?u=http%3A%2F%2Fon.fb.me%2FOE6gnB&amp;h=yAQFjL0pt&amp;s=1"" target=""_blank"" rel=""nofollow nofollow"" onmouseover=""LinkshimAsyncLink.swap(this, &quot;http:\/\/on.fb.me\/OE6gnB&quot;);"" onclick=""LinkshimAsyncLink.swap(this, &quot;\/l.php?u=http\u00253A\u00252F\u00252Fon.fb.me\u00252FOE6gnB&amp;h=yAQFjL0pt&amp;s=1&quot;);"">http://on.fb.me/OE6gnB</a>somehtml";
        string rootedInput = String.Format("<root>{0}</root>", input);
        XDocument doc = XDocument.Parse(rootedInput, LoadOptions.PreserveWhitespace);

        string href;
        var anchors = doc.Descendants("a").ToArray();
        for (int i = anchors.Count() - 1; i >= 0;  i--)
        {
            href = HttpUtility.ParseQueryString(anchors[i].Attribute("href").Value)[0];

            XElement newAnchor = new XElement("a");
            newAnchor.SetAttributeValue("href", href);
            newAnchor.SetValue(href.Replace(@"http://", String.Empty));

            anchors[i].ReplaceWith(newAnchor);
        }
        string output = doc.Root.ToString(SaveOptions.DisableFormatting)
                        .Replace("<root>", String.Empty)
                        .Replace("</root>", String.Empty);
Ivan Golović
  • 8,732
  • 3
  • 25
  • 31