0

I have a stream of HTML code that looks like this:

<br><br><font color=Blue>Item Name:</font> My first item<br>
<font color=Blue>Item Type:</font> My item type<br>
<font color=Blue>Item Color:</font> My item color<br><br>

My idea is to parse at every > sign till the next < to get the strings like Item Name: or My first item, but it should be that way that if there is only one char between like >0< or >#< it shouldn't be stored.

How to do this with C# and writing an output to the console?

Manfred Radlwimmer
  • 13,257
  • 13
  • 53
  • 62
  • 5
    Please don't use Regex to parse HTML. HTML is not a regular language and Regex is, by definition, designed to parse regular expressions only. Try HtmlAgilityPack - see https://stackoverflow.com/questions/56107/what-is-the-best-way-to-parse-html-in-c – Enigmativity Nov 02 '19 at 22:51
  • Obligatory link to the best answer to this question ever: https://stackoverflow.com/a/1732454/3214843 – Manfred Radlwimmer Nov 02 '19 at 22:53
  • 2
    Possible duplicate of [RegEx match open tags except XHTML self-contained tags](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Manfred Radlwimmer Nov 02 '19 at 22:53
  • FYI, "How to parse HTML with RegEx" might be one of the most-asked, most-closed C# question (apart from questions about DateTime formats), so spoiler alert: You don't! Use the appropriate parsed instead (like the one Enigma suggested - HtmlAgilityPack. – Manfred Radlwimmer Nov 02 '19 at 22:56

1 Answers1

0

Step 1. Don't :) https://stackoverflow.com/a/1732454/3214843

I'm warned; I want to parse html using regex

Here is a crud match to extract the strings target at humans in you piece of html.

string input = @"<br><br><font color=Blue>Item Name:</font> My first item<br>
<font color=Blue>Item Type:</font> My item type<br>
<font color=Blue>Item Color:</font> My item color<br><br>";

var pattern = "<font color=.*>(.*?)</font>(.*?)<br>";

var matches = Regex.Matches(input, pattern);

var output2 = matches
            .Select(m => (m.Groups[1].ToString(),m.Groups[2].ToString()))
            .ToList();

foreach (var o in output2) Console.WriteLine(o);
// .NETCoreApp,Version=v3.0
(Item Name:,  My first item)
(Item Type:,  My item type)
(Item Color:,  My item color)
tymtam
  • 31,798
  • 8
  • 86
  • 126