0

I'm working on retriving data from html string.

The string looks like this:

<html dir="ltr">

<head>
    <meta http-equiv="Content-Type" content="text/html;charset=utf-8">
    <title>Výstup reportu</title>
    <style>
        table.list {
            border-collapse: collapse;
        }
    </style>
</head>
<!script!>

<body bgcolor="#E8EAD8">
    <blockquote>
        <p><font size=+2><b> </b></font></p>
        <p> <font style="font-family:monospaced"> <table  class="list" border=1 cellspacing=0 cellpadding=1 rules=groups borderColor=black ><colgroup><colgroup>  <tbody><tr><td style= background:#5dcbfd ><font face="courier new" size="2"><nobr   id=l0002003>Statistika&nbsp;dat&nbsp;</nobr></font></td>
            <td style=b ackground:#5dcbfd><font face="courier new" size="2"><nobr   id=l0002019>&nbsp;PoÄŤet</nobr></font></td>
            </tr>
            <tbody>
                <tr>
                    <td style=b ackground:#eef9ff><font face="courier new" size="2"><nobr   id=l0004003>Předan&#xe9;&nbsp;z&#xe1;znamy</nobr></font></td>
                    <td style=b ackground:#eef9ff><font face="courier new" size="2"><nobr   id=l0004019>18.048</nobr></font></td>
                </tr>
                <tbody></tbody>
                </table><font face="courier new" size="2"><span style="white-space:nowrap"><font face="courier new" size="2" color=#0273bc><nobr style= background:#E8EAD8 id=l0006002>16.10.2018&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;mbew_wg&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;0</nobr></font></span>
                </font>
                <br><font face="courier new" size="2"><nobr><strike>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</strike></nobr></font>
                <br><font face="courier new" size="2"><span style="white-space:nowrap"><font face="courier new" size="2" color=#0273bc><nobr style= background:#E8EAD8 id=l0008002>mbew_wg</nobr></font></span>
                </font>
                <br>
                <table class="list" border=1 cellspacing=0 cellpadding=1 rules=groups borderColor=black>
                    <colgroup>
                        <colgroup>
                            <colgroup>
                                <tbody>
                                    <tr>
                                        <td style=b ackground:#5dcbfd><font face="courier new" size="2"><nobr   id=l0010003>OkOc</nobr></font></td>
                                        <td style=b ackground:#5dcbfd><font face="courier new" size="2"><nobr   id=l0010008>Artikl&nbsp;&nbsp;</nobr></font></td>
                                        <td style=b ackground:#5dcbfd><font face="courier new" size="2"><nobr   id=l0010017>celkem&nbsp;bez&nbsp;trans</nobr></font></td>
                                    </tr>
                                    <tbody>
                                        <tr>
                                            <td style=b ackground:#eef9ff><font face="courier new" size="2"><nobr   id=l0012003>1210</nobr></font></td>
                                            <td style=b ackground:#eef9ff><font face="courier new" size="2"><nobr   id=l0012008>xxx</nobr></font></td>
                                            <td style=b ackground:#eef9ff><font face="courier new" size="2"><nobr   id=l0012017>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;0&nbsp;</nobr></font></td>
                                        </tr>
                                            <tr>
                                                <td style=b ackground:#eef9ff><font face="courier new" size="2"><nobr   id=l0013003>1210</nobr></font></td>
                                                <td style=b ackground:#eef9ff><font face="courier new" size="2"><nobr   id=l0013008>xxx</nobr></font></td>
                                                <td style=b ackground:#eef9ff><font face="courier new" size="2"><nobr   id=l0013017>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;0&nbsp;</nobr></font></td>
                                            </tr>
.....

 <tbody></tbody>
                </table><font face="courier new" size="2"><span style="white-space:nowrap"><font face="courier new" size="2" color=#0273bc><nobr style= background:#E8EAD8 id=l0070002>16.10.2018&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;mbew_wg&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;3</nobr></font></span>
                </font>
                <br><font face="courier new" size="2"><nobr><strike>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</strike></nobr></font>
                <br>
                <table class="list" border=1 cellspacing=0 cellpadding=1 rules=groups borderColor=black>
                    <colgroup>
                        <colgroup>
                            <colgroup>
                                <tbody>
                                    <tr>
                                        <td style=b ackground:#5dcbfd><font face="courier new" size="2"><nobr   id=l0073003>OkOc</nobr></font></td>
                                        <td style=b ackground:#5dcbfd><font face="courier new" size="2"><nobr   id=l0073008>Artikl&nbsp;&nbsp;</nobr></font></td>
                                        <td style=b ackground:#5dcbfd><font face="courier new" size="2"><nobr   id=l0073017>celkem&nbsp;bez&nbsp;trans</nobr></font></td>
                                    </tr>
                                    <tbody>
                                        <tr>
                                            <td style=b ackground:#eef9ff><font face="courier new" size="2"><nobr   id=l0075003>1210</nobr></font></td>
                                            <td style=b ackground:#eef9ff><font face="courier new" size="2"><nobr   id=l0075008>yyy</nobr></font></td>
                                            <td style=b ackground:#eef9ff><font face="courier new" size="2"><nobr   id=l0075017>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;0&nbsp;</nobr></font></td>
                                        </tr>

etc

I am interested in getting the xxx value from this line:

<td style=b ackground:#eef9ff><font face="courier new" size="2"><nobr   id=l0012008>xxx</nobr></font></td>

and the 0 from this: (between **&nbsp; and &nbsp;<nobr></font></td>**

<td style=b ackground:#eef9ff><font face="courier new" size="2"><nobr   id=l0012017>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;0&nbsp;</nobr></font></td>

I would like to have an object, for example

public class Result
{
  string value1;
  string value2; 
}

where I can store the data retrieved from html.

How can I do that? I know that I can use Regex expression but my regex is not working as I expect it to be working.

So far my code looks like this:

var html = myDataAsString; 

var matchForValue1 = @"nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(.+?)&nbsp;</nobr></font></td"

var matchForValue2 = ? (I have no idea) 

And the first match is matching some random values, not the ones I want it to match.

Alice
  • 173
  • 1
  • 15
  • 3
    Possible duplicate of [What is the best way to parse html in C#?](https://stackoverflow.com/questions/56107/what-is-the-best-way-to-parse-html-in-c) – Renatas M. Oct 30 '18 at 14:26
  • The first answer to "How do I use regex to get data from HTML?" is usually "Don't". You are generally much better off using an HTML parsing library to do this job. – asherber Oct 30 '18 at 15:22
  • @asherber I was somewhat opposed to the idea of 3rd party libraries but I ended up using Html Agile Pack and it worked, thanks for the comment – Alice Oct 30 '18 at 15:29
  • HtmlAgilityPack has been around for a long time and is very reliable. I'm glad it's working for you. – asherber Oct 30 '18 at 15:33

0 Answers0