6

I download a URL with IdHTTP.Get, and I need to search the HTML tags and extract some data.

How I can convert the string that IdHTTP.Get returns into an IHTMLDocument2?

Rob Kennedy
  • 161,384
  • 21
  • 275
  • 467
SadeghAlavizadeh
  • 609
  • 3
  • 17
  • 33
  • thanks but this article used twebbrowser and I wont use webbrowser. because I don't need html rendering I just need HTML text for extracting data, and speed is very important for me. – SadeghAlavizadeh Aug 11 '12 at 15:36

2 Answers2

6

Try this one:

uses
  ... Variants, MSHTML, ActiveX;

var Cache: string;
    V: OleVariant;
    Doc: IHTMLDocument2;
begin
  ...

  Cache := IdHTTP.Get(url);
  Doc := coHTMLDocument.Create as IHTMLDocument2; // create IHTMLDocument2 instance
  V := VarArrayCreate([0,0], varVariant);
  V[0] := Cache;
  Doc.Write(PSafeArray(TVarData(v).VArray)); // write data from IdHTTP

  // Work with Doc
end;
Keeper
  • 457
  • 4
  • 14
3

I Googled this problem and I can find a good code for this:

Idoc := CreateComObject(Class_HTMLDOcument) as IHTMLDocument2;
try
  IDoc.designMode := 'on';
  while IDoc.readyState <> 'complete' do
    Application.ProcessMessages;
  v := VarArrayCreate([0, 0], VarVariant);
  v[0] := MyHTML;
  IDoc.Write(PSafeArray(System.TVarData(v).VArray));
  IDoc.designMode := 'off';
  while IDoc.readyState <> 'complete' do
    Application.ProcessMessages;

  ParseHTML(IDoc);
finally
  IDoc := nil;
end;

Regards

SadeghAlavizadeh
  • 609
  • 3
  • 17
  • 33
  • 1
    What about to use the stream way to load the document ? The `IdHTTP` has the `Get` method overload allowing you to receive response to stream (actually is used in the one returning you the string). – TLama Aug 11 '12 at 16:01
  • 3
    I would not use that code. all the `designMode` and `Application.ProcessMessages` to check `readyState` is not needed. You don't need to switch to `designMode=on` in order to be able to write to a `IHTMLDocument`. I strongly suggest you use @Keeper's code. – kobik Dec 18 '12 at 17:32
  • @kobik, interesting is that this, and even Keeper's code fails on `IHTMLDocument2::write` line with `Invalid argument` on Windows 7, Delphi 7 (Personal) with imported MSHTML type library. The very same happens with `PSafeArray(VarArrayAsPSafeArray(VarArrayOf([HTMLWideString])))`. – TLama Apr 21 '13 at 21:40
  • @TLama, I usually declare `document` as `OleVariant` and create it via late binding, e.g.: `document := CreateComObject(CLASS_HTMLDocument) as IDispatch` and simply use `document.write()`. Maybe this is why I never encountered this problem(?)... – kobik Apr 22 '13 at 08:42
  • @kobik, it works with type library shipped with Delphi, but doesn't if you import it by yourself (Windows 7). Even more strange is that they're same for `IHTMLDocument2` interface. – TLama Apr 22 '13 at 08:46
  • @TLama Why won't this execute javascript? `Doc.Write` –  Oct 08 '13 at 20:51
  • @TLama Can we chat? :) –  Oct 09 '13 at 08:11
  • @TLama Its regarding the EmbeddedWB Memory&Handle leaks –  Oct 09 '13 at 08:22