I receive a very irregular HTML file.
<tr class="" rel="30887721">
<td class="leftborder timestamp" rel="1472298782">
<span class="updatets "> 9mins </span>
</td>
<td>
<span>
<style>
.NFK2{display:none}
.gPwA{display:inline}
.Zb70{display:none}
.vFY2{display:inline}
</style>
<span style="display:none">54</span>
<span class="NFK2">54</span>
<div style="display:none">54</div>
<span class="vFY2">124</span>
<span style="display: inline">.</span>
<span class="7">240</span>
<span class="235">.</span>
<div style="display:none">17</div>
<span class="NFK2">62</span>
<span></span>
<span style="display:none">121</span>
<span></span>
<span style="display: inline">187</span>
<span style="display:none">190</span>
<span class="Zb70">190</span>
<span class="NFK2">197</span>
<span></span>
<span style="display: inline">.</span>
<span class="248">80</span>
<div style="display:none">152</div>
<span style="display:none">166</span>
<div style="display:none">166</div>
</span>
</td>
<td> 80 </td>
<td style="text-align:left" class="country" rel="cn">
<span style="white-space:nowrap;">
<img src="/images/1x1.png" style="width: 16px; height: 11px; margin-right: 5px;" class="flags-cn" alt="flag "/>
China
</span>
</td>
<td>
<div class="progress-indicator response_time" style="width: 114px" value="1314" levels="speed" rel="1314">
<div class="indicator" style="width: 87%; background-color: rgb(0, 173, 173)"></div>
</div>
</td>
<td>
<div class="progress-indicator connection_time" style="width: 114px" title="" rel="427" value="427" levels="speed">
<div class="indicator" style="width: 91%; background-color: rgb(0, 173, 173)"></div>
</div>
</td>
<td> HTTP </td>
<td nowrap> High +KA </td>
</tr>
<tr class="altshade" rel="30887719">
<td class="leftborder timestamp" rel="1472298723">
<span class="updatets "> 10mins </span>
</td>
<td>
<span>
<style>
.ZQOg{display:none}
.hAKN{display:inline}
.sZYH{display:none}
.euLE{display:inline}
.pnDV{display:none}
.yf2r{display:inline}
</style>
<span style="display:none">30</span>
<div style="display:none">30</div>
<span class="yf2r">124</span>
<span style="display: inline">.</span>
<span style="display:none">62</span>
<span style="display: inline">244</span>
<span style="display: inline">.</span>
<span class="pnDV">6</span>
<div style="display:none">6</div>
<span class="ZQOg">39</span>
<div style="display:none">39</div>
<span style="display:none">71</span>
<div style="display:none">71</div>
<span style="display:none">103</span>
<span class="sZYH">103</span>
<span></span>
<span class="euLE">157</span>
<span style="display:none">188</span>
<div style="display:none">188</div>
<div style="display:none">208</div>
<span style="display:none">220</span>
<div style="display:none">220</div>
<span class="sZYH">231</span>
<span style="display:none">241</span>
<span class="hAKN">.</span>
<span class="sZYH">26</span>
<span></span>
<span class="sZYH">31</span>
<span></span>
<span style="display:none">66</span>
<div style="display:none">66</div>
<span style="display:none">84</span>
<span class="pnDV">84</span>
<span></span>
<span style="display:none">166</span>
<span class="sZYH">166</span>
<div style="display:none">166</div>
<span style="display:none">207</span>
<span></span>
<span style="display: inline">209</span>
<span class="sZYH">212</span>
<div style="display:none">212</div>
<span style="display:none">241</span>
<span class="pnDV">241</span>
</span>
</td>
<td> 80 </td>
<td style="text-align:left" class="country" rel="hk">
<span style="white-space:nowrap;">
<img src="/images/1x1.png" style="width: 16px; height: 11px; margin-right: 5px;" class="flags-hk" alt="flag "/>
Hong Kong
</span>
</td>
<td>
<div class="progress-indicator response_time" style="width: 114px" value="1165" levels="speed" rel="1165">
<div class="indicator" style="width: 88%; background-color: rgb(0, 173, 173)"></div>
</div>
</td>
<td>
<div class="progress-indicator connection_time" style="width: 114px" title="" rel="287" value="287" levels="speed">
<div class="indicator" style="width: 94%; background-color: rgb(0, 173, 173)"></div>
</div>
</td>
<td> HTTP </td>
<td nowrap> High +KA </td>
</tr>
I need to extract every text inside TD of this file, the result should be like this:
9mins 124.240.187.80 80 China HTTP High +KA
10mins 124.244.157.209 80 Hong Kong HTTP High +KA
I'm facing to many problems to get this result.
The first is because of the invalid markups, like span inside span, style inside span, etc...
The second is because it needs some live parsing, to eval the <style>
tags in it.
The Style tags and Style attributes say what elements should appear and what's not.
I'm using C# + CsQuery to extract this results, but, until now, no success.
CQ dom = CQ.Create(text);
CQ tr = dom.Select("table tr");
foreach(var item in tr)
{
string lastCheck = tr.Select("td:eq(0)").Text(); //9mins
string ip = tr.Select("td:eq(1)").Text();
string port = tr.Select("td:eq(2)").Text(); //80
string country = tr.Select("td:eq(3)").Text(); //China
string protocol = tr.Select("td:eq(6)").Text(); //HTTP
string anonymity = tr.Select("td:eq(7)").Text(); //High + KA
}
the IP var returns something like:
".Yj0s{display:none}\n.YSE7{display:inline}\n.zURn{display:none}\n.odWZ{display:inline}637891919292106106137183183183188245245254.85135.166.117177214214225"
if I change IP var to get HTML:
string ip = tr.Select("td:eq(1)").Html();
it returns something like this:
" <span> <style>.PLBz{display:none}\n.hjVo{display:inline}</style><span class=\"PLBz\">92</span><div style=\"display:none\">92</div><span style=\"display:none\">114</span><span class=\"PLBz\">114</span><div style=\"display:none\">114</div><span class=\"hjVo\">122</span><span class=\"PLBz\">240</span><div style=\"display:none\">240</div>.96<span style=\"display:none\">175</span><span class=\"PLBz\">175</span><div style=\"display:none\">191</div><span style=\"display:none\">229</span><span class=\"PLBz\">229</span><div style=\"display:none\">229</div><span style=\"display:none\">241</span><span></span><span class=\"80\">.</span><div style=\"display:none\">22</div><span style=\"display:none\">38</span><div style=\"display:none\">38</div><span class=\"hjVo\">59</span><span class=\"PLBz\">156</span><div style=\"display:none\">156</div>.<span style=\"display:none\">18</span><span class=\"PLBz\">18</span><div style=\"display:none\">18</div><span class=\"PLBz\">45</span><div style=\"display:none\">45</div>104<span class=\"PLBz\">145</span><span></span><span style=\"display:none\">150</span><span class=\"PLBz\">150</span><div style=\"display:none\">150</div><span style=\"display:none\">178</span><div style=\"display:none\">178</div><span></span><span class=\"PLBz\">252</span><div style=\"display:none\">252</div> </span> "
How can I get IP showing the correct value?