0

I have a system that gets a html via GET every 1 second to update the system data in this html can have 1-20 forms, need to take all forms and assemble a querystring fields of each form, I have a function that does this, the problem is that it is taking longer than obtaining server html, what's wrong with the code? or how to do it differently?

procedure XThread.GetForms;
var
  sTemp, xResF : String;
  FormItem, v: Variant;
  Field: Variant;
  J, q, i, contCampos,
  tmForm : Integer;
  IDocForm : IHTMLDocument2;
begin
 IDocForm := CreateComObject(Class_HTMLDOcument) as IHTMLDocument2;
 v := VarArrayCreate([0, 0], VarVariant);
 v[0] := strFormMAT; //string html
 IDocForm.Write(PSafeArray(System.TVarData(v).VArray));
 IDocForm.Close;

 tmForm := (IDocForm.all.tags('FORM') as IHTMLElementCollection).Length;
 SetLength(matFormsArray, 0); //matFormsArray = Global Array of Array
 SetLength(matFormsArray, tmForm);
 for q := 0 to tmForm -1 do
   begin
    SetLength(matFormsArray[q], 2);

    FormItem := (IDocForm.all.tags('FORM') as IHTMLElementCollection).item(q, 0);
    xResF := '';
    sTemp := FormItem.Name;
    contCampos := FormItem.Length;
     for j := 0 to contCampos - 1 do
      begin
        Field := FormItem.Item(j);
        xResF := xResF + Field.Name + '=' + Field.Value;
        if j < FormItem.Length - 1 then
          xResF := xResF + '&';
      end;

      matFormsArray[q, 0] := sTemp;
      matFormsArray[q, 1] := xResF;
   end;
end;

strFormMAT =

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
</head>


<body>

 <form name="fprin" id="fprin">
   <input type="hidden" name="field11" value="value11"></input>
   <input type="hidden" name="field12" value="value12"></input>
   <input type="hidden" name="field13" value="value13"></input>
 </form>

 <table>
  <tr>
   <td>Title</td>
   <td>Title</td>
   <td>Title</td>
   <td>Title</td>
  </tr>

  <tr>
   <td>Value</td>
   <td>Value</td>
   <td>Value</td>
   <form name="xxx1" id="xxx1">
   <td>Value</td>
   <td>Value</td>
   <input type="hidden" name="field11" value="value11"></input>
   <input type="hidden" name="field12" value="value12"></input>
   <input type="hidden" name="field13" value="value13"></input>
   <input type="hidden" name="field14" value="value14"></input>
   </form>
  </tr>

  <tr>
   <td>Value</td>
   <td>Value</td>
   <td>Value</td>
   <form name="xxx2" id="xxx2">
   <td>Value</td>
   <td>Value</td>
   <input type="hidden" name="field21" value="value21"></input>
   <input type="hidden" name="field22" value="value22"></input>
   <input type="hidden" name="field23" value="value23"></input>
   <input type="hidden" name="field24" value="value24"></input>
   </form>
  </tr>

  <tr>
   <td>Value</td>
   <td>Value</td>
   <td>Value</td>
   <form name="xxx3" id="xxx3">
   <td>Value</td>
   <td>Value</td>
   <input type="hidden" name="field31" value="value31"></input>
   <input type="hidden" name="field32" value="value32"></input>
   <input type="hidden" name="field33" value="value33"></input>
   <input type="hidden" name="field34" value="value34"></input>
   </form>
  </tr>

  <tr>
   <td>Value</td>
   <td>Value</td>
   <td>Value</td>
   <form name="xxx4" id="xxx4">
   <td>Value</td>
   <td>Value</td>
   <input type="hidden" name="field41" value="value41"></input>
   <input type="hidden" name="field42" value="value42"></input>
   <input type="hidden" name="field43" value="value43"></input>
   <input type="hidden" name="field44" value="value44"></input>
   </form>
  </tr>

  <tr>
   <td>Value</td>
   <td>Value</td>
   <td>Value</td>
   <form name="xxx5" id="xxx5">
   <td>Value</td>
   <td>Value</td>
   <input type="hidden" name="field51" value="value51"></input>
   <input type="hidden" name="field52" value="value52"></input>
   <input type="hidden" name="field53" value="value53"></input>
   <input type="hidden" name="field54" value="value54"></input>
   </form>
  </tr>

  <tr>
   <td>Value</td>
   <td>Value</td>
   <td>Value</td>
   <form name="xxx6" id="xxx6">
   <td>Value</td>
   <td>Value</td>
   <input type="hidden" name="field61" value="value61"></input>
   <input type="hidden" name="field62" value="value62"></input>
   <input type="hidden" name="field63" value="value63"></input>
   <input type="hidden" name="field64" value="value64"></input>
   </form>
  </tr>


 </table>

</body>

</html>

I do the procedure call within a Thread using Synchronize(GetForms); and even then crashes and slow.

my problem is not getting the html, it's already up and running, the problem is to extract the forms of html, this procedure is slow

Jason-X
  • 39
  • 2
  • 11
  • Bring Your parser procedure outside of the thread class. Make it a local procedure of the tform, and call it using method "Synchronize". – The North Star Nov 13 '15 at 14:43
  • 1
    You are using a DOM-based parser, which is inherently slow since it has to parse the HTML and create a tree of objects in memory for every HTML element. But more importantly, you are calling `IDocForm.all.tags('FORM')` too many times - once to discover the number of `
    ` elements, and then again on every loop iteration. Don't do that, it is wasted overhead. Call it once and save the resulting `IHTMLElementCollection` to a variable, and then call `item()` on that variable as needed.
    – Remy Lebeau Nov 13 '15 at 19:41
  • 1
    In any case, you are looking for specific things in the HTML, so you should consider just removing the DOM parser altogether and use simple substring search+extract operations instead. That would be much faster. Or at least switch to a SAX-based, or better a Reader-based, HTML parser instead of a DOM-based parser. – Remy Lebeau Nov 13 '15 at 19:42
  • It is probably to use pure pascal HTML parser. Sad is that I don't know any free - just this paid one - DIHtmlParser – smooty86 Nov 13 '15 at 21:44
  • Thanks guys, I'll look for another way to work with html, really it's driving the slowest application, for use in various places if anyone has any tips, thank you! – Jason-X Nov 14 '15 at 19:01

1 Answers1

0

It's not entirely clear by your question what you're trying to do, but it looks like you should look into a different technique to use than a HTMLDocument with this custom HTML code loaded to do the work.

If what needs to be done is call a certain URL with the query string built up of fields and values, you should look into TIdHTTP or any of the other components to perform a HTTP call directly (or indirectly).

Community
  • 1
  • 1
Stijn Sanders
  • 35,982
  • 11
  • 45
  • 67
  • Hello, my problem is not getting the html, it's already up and running, the problem is to extract the forms of html, this procedure is slow. – Jason-X Nov 13 '15 at 15:23
  • I added a model of html code that extract the necessary forms and fields – Jason-X Nov 13 '15 at 17:27
  • It's still absolutely unclear to me what it is you're trying to do with this. Even less now it looks like I was wrong to assume you're trying to submit the forms. – Stijn Sanders Nov 16 '15 at 10:11