With Delphi Rio, I am using an HTML/DOM parser. I am traversing the various nodes, and the parser is returning attributes/tags. Normally these are not a problem, but for some attributes/tag, the string returned includes multiple attributes. I need to parse this string into some type of container, such as a stringlist. The attribute string the parser returns already has the '<' and '> removed.
Some examples of attribute strings are:
data-partnumber="BB3312" class=""
class="cb10"
account_number = "11432" model = "pay_plan"
My end result that I want is a StringList, with one or more name=value pairs. I have not used RegEx to any real degree, but I think that I want to use RegEx. Would this be a valid approach? For a RegEx pattern, I think the pattern I want is
\w\s?=\s?\"[^"]+"
To identify multiple matches within a string, I would use TRegex.Matches. Am I overlooking something here that will cause me issues later on?
*** ADDITIONAL INFO *** Several people have suggested to use a decent parser. I am currently using the openSource HTML/DOM parser found here: https://github.com/sandbil/HTML-Parser In light of that, I am posting more info... here is an HTML Snippet I am parsing. Look at the line I have added *** at the end. My parser is returning this as
Node.AttributeText= 'data-partnumber="B92024" data-model="pay_as_you_go" class="" '
Would a different HTML DOM parser return this as 3 different elements/attributes? If so, can someone recommend a parser?
<section class="cc02 cc02v0" data-trackas="cc02" data-ocomid="cc02">
<div class="cc02w1">
<div class="otable otable-scrolling">
<div class="otable-w1">
<table class="otable-w2">
<thead>
<tr>
<th>Product</th>
<th>Unit Price</th>
<th>Metric</th>
</tr>
</thead>
<tbody>
<tr>
<td class="cb152title"><div>MySQL Database for HeatWave-Standard-E3</div></td>
<td><div data-partnumber="B92024" data-model="pay_as_you_go" class="">$0.3536<span></span></div></td> *****
<td><div>Node per hour</div></td>
</tr>
<tr data-partnumber="B92426">
<td class="cb152title">MySQL Database—Storage</td>
<td><span data-model="pay_as_you_go" class="">$0.04<span></span></span></td>
<td>Gigabyte storage capacity per month</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</section>