0

I grabbed outlook's appointment description and got this string:

<html>
<head>
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
  ID: 123456<br>
  Comments: blah blah
</body>
</html>

I need to get ID value 123456 and Comments value out with c# code. I can only use standard .NET library, that is, I can't use html agility pack. I did something like this:

var index = html.IndexOf("ID");
var IDindex = index + "ID".Length + 2 ;
var IDvalue = html.Substring( IDIndex,6);

But I like to do something more robust to handle for example ID length change.

James Z
  • 12,209
  • 10
  • 24
  • 44
Meidi
  • 562
  • 1
  • 8
  • 28
  • You could definitely write a regular expression to extract it. – ameer Dec 07 '17 at 19:45
  • A more robust solution is to use Html Agility Pack or some other library that is specifically designed to parse HTML. – crashmstr Dec 07 '17 at 19:47
  • @ameer Obvious troll re: https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 ? – Yuck Dec 07 '17 at 19:47
  • 1
    @crashmstr OP states _I can only use standard .NET library, that is, I can't use html agility pack_ – Yuck Dec 07 '17 at 19:48
  • @Yuck yes, but that is the *robust answer*. Nothing else will be *robust*. Sometimes it is better to change the rules than write fragile code. – crashmstr Dec 07 '17 at 19:49
  • @crashmstr See also: https://en.wiktionary.org/wiki/can%27t – Yuck Dec 07 '17 at 19:56
  • @Yuck see also: [Kobayashi Maru](https://en.wikipedia.org/wiki/Kobayashi_Maru) – crashmstr Dec 07 '17 at 19:57
  • You're not trying to parse/interpret HTML, you're just trying to extract a value from a string. Regex is just fine `ID: ([0-9]{1,})\
    `
    – Paul Abbott Dec 07 '17 at 20:02
  • @Yuck why wouldn't something like `ID:\s*(\d+)` work – ameer Dec 07 '17 at 20:04
  • Parsing HTML with regex is not recommended, you can get varied results. – EasyE Dec 07 '17 at 20:20

1 Answers1

1

I would try something with a regular expression match and checking the 1st captured group of a regex like ID:\s*(\d+)<br />

using System;
using System.Text.RegularExpressions;

namespace RegexExample
{
    class Program
    {
        static void Main(string[] args)
        {
            foreach (Match match in Regex.Matches("ID: 12345<br />", @"ID:\s*(\d+)<br />"))
                Console.WriteLine(match.Groups[1]);
        }
    }
}
ameer
  • 2,598
  • 2
  • 21
  • 36