2

I am using TWebBrowser in DesignMode (Doc.DesignMode := 'On') to compose a HTML document. There is no document (HTML file on disk) loaded in TWebBrowser. I create the document from zero directly in TWebBrowser. The html code will be extracted from TWebBrowser and saved as c:/MyProjects/SomeHtmlName.html.

The problem is that it won't show images I insert if they have relative path.

More exactly, if I paste this code in the WebBrowser it will instantly display the image:

<IMG src="file:///c:/MyProjects/resources/R.PNG">

However, if I enter:

<IMG border=0 src="resources\R.PNG">
<IMG border=0 src="resources/R.PNG">   <---- preferred so it will work on Linux

it will display an image placeholder instead of the actual image.

I need relative paths so the web site will still work if I change the root path OR if I upload it on FTP.


procedure TForm1.Button1Click(Sender: TObject);
begin
  LoadDummyPage;
  SetHtmlCode('<img src="resources/test_img.PNG">');
  Memo1.Text:= GetHtmlCode;
end;



function TForm1.LoadDummyPage: Boolean;
const FileName: string= 'c:\MyProject\_ONLINE WEB SITE\dummy.html';
begin
  if not Assigned(wbBrowser.Document)
  then wbBrowser.Navigate('about:blank');

  Result := FileExists(FileName);
  if Result
  then wbBrowser.Navigate('file://' + FileName)
  else Caption:= 'file not found';
end;



procedure TForm1.SetHtmlCode(const HTMLCode: string);
var
  Doc: Variant;
begin
  if not Assigned(wbBrowser.Document)
  then wbBrowser.Navigate('about:blank');

  Doc := wbBrowser.Document;
  Doc.Write(HTMLCode);
  Doc.Close;
  Doc.DesignMode := 'On';

  WHILE wbBrowser.ReadyState < READYSTATE_INTERACTIVE
   DO Application.ProcessMessages;

  Doc.body.style.fontFamily := 'Arial';
  Doc.Close;
end;


function TForm1.GetHtmlCode: string;             { Get the HTML code from the browser }
var
  Doc: IHTMLDocument2;
  BodyElement: IHTMLElement;
begin
  if Assigned(wbBrowser.Document) and (wbBrowser.Document.QueryInterface(IHTMLDocument2, Doc) = S_OK) then
  begin
    BodyElement := Doc.body;
    if Assigned(BodyElement) then
      Result := BodyElement.innerHTML;
  end;

  if Result > ''
  then Result := StringReplace(Result, '="about:', '="', [rfReplaceAll, rfIgnoreCase]);  { Fix the 'How stop TWebBrowser from adding 'file:///' in front of my links' bug }
end;
Gabriel
  • 20,797
  • 27
  • 159
  • 293
  • Does it work if the current working directory of the executable is c:/MyProjects/? – mjn42 Feb 02 '17 at 13:41
  • The 'hack' I could use would be to post-process the HTML and remove the 'file:///c:/MyProjects/' – Gabriel Feb 02 '17 at 13:41
  • @mjn42 - nope. probably because the images are in 'Resources' folder? Plus, I think that the 'root' folder of TWebBrowser is Internet Explorer's default root folder (whatever this might be) and not app's folder. – Gabriel Feb 02 '17 at 13:42
  • 2
    see http://stackoverflow.com/questions/18441233/html-base-tag-referring-to-local-folder – mjn42 Feb 02 '17 at 15:39
  • @mjn42-Not working for me because of "catch 22". You can use the BASE tag ONLY if you load a document FROM FILE in WebBrowser. Details here: http://stackoverflow.com/questions/2686774/how-to-set-delphi-webbrowser-base-directory-different-that-html-location – Gabriel Feb 02 '17 at 20:55
  • 3
    @DarkPresidentOfAmerica, you are wrong. You need to pre-load a HTML "template" string/stream including the BASE tag where you set the desired path (with trailing slash) e.g. `"file:///c:/MyProjects/"`. and switch to edit mode, where your images src should be relative e.g. `"resources/R.PNG"`. your final extracted HTML ater editing should be the `body.innerHTML`. wrap it with valid HTML/Body WITHOUT the BASE tag and save to disk. – kobik Feb 02 '17 at 23:46
  • @kobik-You mean to load an "empty" HTML dummy file like this: c:/MyProjects/dummy.html ? It will probably work. Can you post this as Answer so I can accept it? – Gabriel Feb 03 '17 at 10:31
  • You could. But what I meant was document.write a template HTML – kobik Feb 03 '17 at 11:35

2 Answers2

5

You need to pre-load a valid HTML "template" string/stream including the BASE tag where you set the desired path (with trailing slash) e.g. "file:///c:/MyProjects/".

And switch to edit mode, where your images SRC should be relative e.g. "resources/R.PNG". Your final extracted HTML ater editing should be the body.innerHTML or body.outerHTML (whatever you need). You can even take the whole document source (google it).

Wrap the extracted source with valid HTML/Body WITHOUT the BASE tag and save to disk at c:\MyProjects.

but the code resulted for IMG SRC is full path!

Nothing much you can do about it. this is how the DOM represent the HTML - it's not necessary the HTML source code. this behavior is not consistent. and also depend on how you insert images (I do not use execCommand and have my own dialog and insert my own html code). You need to manually replace the extracted source "file:///c:/MyProjects/" with empty string. at least, this is how I do it.

Edit: You don't need to Navigate() to an external file. you can write the "template"/"empty" HTML via document.write(HTML).

Try this:

const
  HTML_TEMPLATE = '<html><head><base href="file:///%s"></head><body style="font-family:Arial">%s</body></html>';

procedure TForm1.LoadHTML(HTMLCode: string);
var
  Doc: Variant;
  HTML, Path: string;
begin
  Path := 'D:\Temp\';
  HTML := Format(HTML_TEMPLATE, [Path, HTMLCode]);
  WebBrowser1.Navigate('about:blank');
  Doc := WebBrowser1.Document;
  Doc.Write(HTML);
  Doc.Close;
  Doc.DesignMode := 'On';
end;

procedure TForm1.Button1Click(Sender: TObject);
begin
  LoadHTML('<b>Hello</b><img SRC="resources/1.png">');
end;

procedure TForm1.Button2Click(Sender: TObject);
var
  Doc: IHTMLDocument2;
begin
  Doc := WebBrowser1.Document as IHTMLDocument2;
  if Assigned(Doc) then
  begin
    ShowMessage(Doc.body.innerHTML); 
  end;
end;

The output for me is: <B>Hello</B><IMG src="resources/1.png">. in some cases the src might contain the full path. I can't 100% be sure to when this happens. but you need to be ready to deal with this situation by manually replacing the path. there is no conclusive documentation about this so I always deal with this issue in any case.

kobik
  • 21,001
  • 4
  • 61
  • 121
  • "You need to manually replace..." - But this is what I am ALREADY doing (see the comment I put under the Question) :) I was providing the FULL image path, then 'post processing' the HTML code convert from full to relative. And all this without having to use the complicated 'BASE'. I considered this a 'hack' and hoped that I could make TWeBBrowser use relative paths for real :) – Gabriel Feb 03 '17 at 11:54
  • 2
    @DarkPresidentOfAmerica, "hoped that I could make TWeBBrowser use relative paths for real". but it *does* when you use the BASE! :) the `innerHTML` is the representation of how the DOM actually parsed the source code. I know this the replacement is "hacky" but there is not much you can do about it AFAIK. – kobik Feb 03 '17 at 12:29
  • Then, in order to deal with that I will simply use full paths all the way and post process the HTML code to convert to relative. – Gabriel Feb 03 '17 at 14:50
  • What you say make sense depending on your usage. But if you need to reuse the output html you will also need to pre-process it which is a bit of a pain without baae tag. – kobik Feb 03 '17 at 15:17
  • Yes. It actually is quite some pain... you need to extract all paths from SRC, you need to decode the URL, convert it to DOS file name, make it relative, convert it back to Linux path, and re-encode it. But I have done it :) – Gabriel Feb 03 '17 at 15:29
0

I suggest to use a small embedded web server such as Internet Direct (Indy) TIdHttpServer, which is able to serve all HTTP requests in a standard way. This removes all potential file system trouble.

mjn42
  • 830
  • 1
  • 8
  • 24
  • Thank for the suggestion but this hack is more extreme than mine :) Think about the implications of installing a web server on a computer. Most antivirus programs will have something to comment on that :) – Gabriel Feb 02 '17 at 14:16
  • Bind it to localhost (127.0.0.1) only then. Many processes open local ports and AV does not complain. They may complain on port scans. – mjn42 Feb 02 '17 at 14:37