0

I have a string [Its not just looking at the GUID pattern in a string, I'm using HtmlAgilityPack to parse and convert them in to htmlnodes, then i have to extract this guid only if the node contains, extractable id and type=\"ClickButton\" value='upload, for simplicity i reduced all the details]

"\r\n                        <extractable id=\"00000000-0000-0000-0000-000000000000\" class=\"myButtonC\" type=\"ClickButton\" value='upload'>\r\n                    "

I want to extract the GUID out of it. It's part of HTML parsing. So I made used the below way and attempted extraction and seems not working. How do I represent "\" " ? and "=\"" ? I used " as \" and \ as \ for literals. Any suggestion?

private static string ExtractId(string str)       
{
    string eId = string.Empty;
    string[] arrys = str.Split(new string[] {@"\"" "}, StringSplitOptions.None);
    foreach (string[] lists in arrys.Select(t => t.Split(new string[] {@"=\"""}, StringSplitOptions.None)))
    {
        for (int j = 0; j < lists.Length; j++)
        {
            if (lists[j].Contains("extractable id"))
            {
                eId = lists[j + 1];
            }
        }
    }
    return eId;
}
  • Possible duplicate of [Regular Expression to identify a Guid followed by a number](http://stackoverflow.com/questions/29138593/regular-expression-to-identify-a-guid-followed-by-a-number) – Xavier J May 02 '17 at 19:01
  • its like xml try to use xml readers – Shahrooz Ansari May 02 '17 at 19:03
  • ... or [C# Regex for Guid](http://stackoverflow.com/a/11040993/205233) - just looking for the GUID will make things way easier. – Filburt May 02 '17 at 19:03
  • I should have explained a bit more, i have to check if the list is having "extractable id" then extract that GUID, there are possibility of another GUID in the same htmlnode. Thats one of the reason i went with string literals to parse and compare... – anotheruser May 02 '17 at 19:07
  • 4
    It looks like HTML, so use [HtmlAgilityPack](https://www.nuget.org/packages/HtmlAgilityPack) to parse it. – Hans Kesting May 02 '17 at 19:09
  • 1
    I agree with Hans, use HtmlAgitityPack to get the `extractable` node and pull the value from the `id` attribute. – Scott Chamberlain May 02 '17 at 19:13
  • 2
    I second treating it as html - if you rely on a character sequence and try to parse it, your code is likely to break if the xml/html changes to `` – Filburt May 02 '17 at 19:14
  • I went with regex for innerHTML and exclude nodes which i dont want for now. Thanks Everyone, – anotheruser May 02 '17 at 19:42

2 Answers2

3

I suggest using regular expressions to match Guids:

string source = "\r\n <extractable id=\"00000000-0000-0000-0000-000000000000\" class=\"myButtonC\" type=\"ClickButton\" value='upload'>\r\n";

Guid[] result = Regex
  .Matches(
     source, 
    "[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}") 
  .OfType<Match>()
  .Select(match => new Guid(match.Value))
  .ToArray();
Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
0

How about using Regex

string pattern = @"([a-z0-9]{8}[-][a-z0-9]{4}[-][a-z0-9]{4}[-][a-z0-9]{4}[-][a-z0-9]{12})";

MatchCollection mc = Regex.Matches(your_string, pattern);

foreach (var sGUID in mc)
{
    // do what you want
}
Ivan Salo
  • 811
  • 1
  • 9
  • 25