1

I have the below tag in a variable. I need to the extract the values of type and id to different variables using C#. What would be the best approach?

<a href="gana:$type=FlexiPage;id=c828c4ea-075d-4dde-84f0-1876f8b71fa8;title=Workflow%20flexi$">workflow link</a>
samithagun
  • 664
  • 11
  • 25
  • 1
    You could use Regex – Lucifer Sep 08 '17 at 11:12
  • @Lucifer you could. But it would be a [really](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) bad idea – Jamiec Sep 08 '17 at 11:12
  • Something presumably builds up that html tag - can't you get these values from the source. This looks awfully like an XY Problem to me. – Jamiec Sep 08 '17 at 11:13
  • @Jamiec No, I cannot actually. Somehow I need to extract from the above. – samithagun Sep 08 '17 at 11:14

2 Answers2

3

I would also use HtmlAgilityPack if i had to parse HTML. You can use SelectSingleNode, GetAttributeValue and string methods to create a dictionary of key- and value pairs:

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html));
var anchor = doc.DocumentNode.SelectSingleNode("a");
string href = anchor.GetAttributeValue("href", "");

// take the text between both $
int startIndex = href.IndexOf('$') + 1;
href = href.Substring(startIndex, href.Length - startIndex); 

Dictionary<string, string> pageInfos = href.Split(';')
    .Select(token => token.Split('='))
    .ToDictionary(kv => kv[0].Trim(), kv => kv[1].Trim(), StringComparer.InvariantCultureIgnoreCase);
string id = pageInfos["id"];     // c828c4ea-075d-4dde-84f0-1876f8b71fa8
string type = pageInfos["type"]; // FlexiPage
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
2

You may use HTML Agility Pack and RegEx on the attribute value:

// With XPath   
var hrefValue = doc.DocumentNode
    .SelectNodes("//a")
    .First()
    .Attributes.First(a => a.Name =="href");

// With LINQ    
var hrefAttributeValue = doc.DocumentNode.Descendants("a")
    .Select(y => y.Descendants()
    .First().Attributes.First(a => a.Name =="href");
Boris Modylevsky
  • 3,029
  • 1
  • 26
  • 42