4

I want to make the program that downloads page from internet and makes some parsing on it. Second part is easy, problem is first.

I want to use URLDownloadToFile() function. But by default it doesn't wait for completing the download. MSDN says that the last param is sort of callback function, but I can't find any info of how to use it (when it is called and what it must do, even what type of function it is). Can someone explain me what is that last parameter and how use it (in C++) to make my app wait?

Xirdus
  • 2,997
  • 6
  • 28
  • 36
  • Why don't just use URLOpenBlockingStream if async doesn't work for you? – Eugen Constantin Dinca Mar 13 '11 at 18:35
  • Because I know even less about it. I guess (basing on mention of IStream) it will require using managed c++ with which I am not familiar. – Xirdus Mar 13 '11 at 18:59
  • 2
    Xirdus, using interfaces (such as IStream) does not require using managed C++. Windows was using interfaces long before .Net was invented. COM makes heavy use of interfaces, and the Internet Explorer APIs make heavy use of COM. URLDownloadToFile itself has two interface parameters. – Rob Kennedy Mar 13 '11 at 19:25
  • 1
    I'm pretty sure this function is synchronous... – Rob Mar 13 '11 at 20:02

5 Answers5

11

You have to create a class that implements the IBindStatusCallback interface. You can return E_NOTIMPL for most of the methods. Use OnProgress() to show progress. Here's a sample program that gets this done:

#include "stdafx.h"
#include <windows.h>
#include <iostream>
#pragma comment(lib, "urlmon.lib")
using namespace std;

class DownloadProgress : public IBindStatusCallback {
public:
    HRESULT __stdcall QueryInterface(const IID &,void **) { 
        return E_NOINTERFACE;
    }
    ULONG STDMETHODCALLTYPE AddRef(void) { 
        return 1;
    }
    ULONG STDMETHODCALLTYPE Release(void) {
        return 1;
    }
    HRESULT STDMETHODCALLTYPE OnStartBinding(DWORD dwReserved, IBinding *pib) {
        return E_NOTIMPL;
    }
    virtual HRESULT STDMETHODCALLTYPE GetPriority(LONG *pnPriority) {
        return E_NOTIMPL;
    }
    virtual HRESULT STDMETHODCALLTYPE OnLowResource(DWORD reserved) {
        return S_OK;
    }
    virtual HRESULT STDMETHODCALLTYPE OnStopBinding(HRESULT hresult, LPCWSTR szError) {
        return E_NOTIMPL;
    }
    virtual HRESULT STDMETHODCALLTYPE GetBindInfo(DWORD *grfBINDF, BINDINFO *pbindinfo) {
        return E_NOTIMPL;
    }
    virtual HRESULT STDMETHODCALLTYPE OnDataAvailable(DWORD grfBSCF, DWORD dwSize, FORMATETC *pformatetc, STGMEDIUM *pstgmed) {
        return E_NOTIMPL;
    }        
    virtual HRESULT STDMETHODCALLTYPE OnObjectAvailable(REFIID riid, IUnknown *punk) {
        return E_NOTIMPL;
    }

    virtual HRESULT __stdcall OnProgress(ULONG ulProgress, ULONG ulProgressMax, ULONG ulStatusCode, LPCWSTR szStatusText)
    {
        wcout << ulProgress << L" of " << ulProgressMax;
        if (szStatusText) wcout << " " << szStatusText;
        wcout << endl;
        return S_OK;
    }
};


int _tmain(int argc, _TCHAR* argv[])
{
    DownloadProgress progress;
    HRESULT hr = URLDownloadToFile(0, 
        L"http://sstatic.net/stackoverflow/img/sprites.png?v=3", 
        L"c:/temp/test.png", 0,
        static_cast<IBindStatusCallback*>(&progress));
    return 0;
}

Output:

0 of 0 sstatic.net
0 of 0 64.34.119.12
0 of 0
0 of 0 image/x-png
3550 of 16542 http://sstatic.net/stackoverflow/img/sprites.png?v=3
3550 of 16542 C:\Users\hpassant\AppData\Local\Microsoft\Windows\Temporary Inter
et Files\Content.IE5\NRPH4KHK\sprites[1].png
7330 of 16542 http://sstatic.net/stackoverflow/img/sprites.png?v=3
8590 of 16542 http://sstatic.net/stackoverflow/img/sprites.png?v=3
12370 of 16542 http://sstatic.net/stackoverflow/img/sprites.png?v=3
13630 of 16542 http://sstatic.net/stackoverflow/img/sprites.png?v=3
16542 of 16542 http://sstatic.net/stackoverflow/img/sprites.png?v=3
Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • +1 it helped. Also, is it possible to handle file download which happens because of form submit, for example, http://www.sciencedirect.com/science?_ob=DownloadURL&_method=confirm&_eidkey=1-s2.0-S0278584610003544&count=1&_docType=FLA&zone=toolbar&_acct=C000228598&_version=1&_userid=10&md5=772758ab8eaf9802e155951721c1ff64 thanks. – Favonius Apr 08 '13 at 18:29
  • I guess, the line `IBindStatusCallback* callback = (IBindStatusCallback*)&progress;` can be removed, right? – honk Jan 22 '17 at 19:28
9

Probably function immediatelly returns because of error.

URLDownloadToFile() is definitely syncronous function, if you set LPBINDSTATUSCALLBACK lpfnCB as NULL.

It is so "syncronous", what it will never end until its download completion, even if network connection fails and will block your thread. Killing thread with URLDownloadToFile() in progress by TerminateThread() function will cause resources leak and child calls to system dlls unfinished and after couple of times URLDownloadToFile() will refuse to work in context of current process.

The only way of reliable usage of URLDownloadToFile() without callback function is to fork separate process to it and kill that process if download stalls which is resource consuming.

URLDownloadToFile() download behaviours exactly the same way as IE, all IE proxy and network settings in user profile in which context this function is running will apply to this function also.

Also URLDownloadToFile() doesn't return immediately even with callback function. I consider to start URLDownloadToFile() in separate thread to safely control and abort network download.

There is simple example of callback function at https://github.com/choptastic/OldCode-Public/blob/master/URLDownloadToFile/URLDownloadToFile.cpp

To get safe download you should upgrade code at least with something like:

private:
    int progress, filesize;
    int AbortDownload;

public:

STDMETHOD(OnStartBinding)(
    { 
        AbortDownload=0;
        progress=0;
        filesize=0;
        return E_NOTIMPL; }

    STDMETHOD(GetProgress)()
        { return progress; }

    STDMETHOD(GetFileSize)()
        { return filesize; }
STDMETHOD(AbortDownl)()
    { 
        AbortDownload=1;
        return E_NOTIMPL; }

HRESULT DownloadStatus::OnProgress ( ULONG ulProgress, ULONG ulProgressMax,ULONG ulStatusCode, LPCWSTR wszStatusText )
{
    progress=ulProgress;
    filesize=ulProgressMax;
    if (AbortDownload) return E_ABORT;
    return S_OK;
}

so you can always abort download and check progress of download.

Even after download have been indicated as completed by S_OK returned by URLDownloadToFile() function you have to compare progress==filesize values, because URLDownloadToFile() can drop download with S_OK by mistake, for example if connection is made via network bridge of local network interfaces and bridge have fallen down for some reason.

Also you have to pay attention to DeleteUrlCacheEntry() function in pair with URLDownloadToFile() to free disk space after download, because all donloaded content is cached at disk by default according to IE caching policy.

Lyubomyr
  • 51
  • 1
  • 4
2

Something as simple as the sample below should do the trick if you want to just download the file synchronously:

HRESULT hRez = URLDownloadToFile( NULL, _T(<url>), _T(<file>), 0, NULL );
if( hRez == 0 ){
 // download ok
}
else{
 // download failed
}
Eugen Constantin Dinca
  • 8,994
  • 2
  • 34
  • 51
  • 3
    Huh? A return value of zero (S_OK) indicates the download *started* successfully. It does *not* indicate that the download has finished. It will return S_OK "even if the file cannot be created and the download is canceled." – Rob Kennedy Mar 13 '11 at 20:09
  • 1
    I use this function in one of my apps and have to run it in another thread as it is definitely synchronous. Perhaps it only works asynchronously if you pass an IBindStatusCallback pointer (which I don't). – Rob Mar 13 '11 at 20:55
1

The documentation says the final parameter is a pointer to "the IBindStatusCallback interface of the caller." That means you, as the caller, need to provide a pointer to something that implements that interface. You could start with an implementation like this:

class CBindStatusCallback: public IBindStatusCallback
{
public:
  STDMETHODIMP OnProgress(ULONG ulProgress, ULONG ulProgressMax,
    ULONG ulStatusCode, LPCWSTR szStatusText)
  {
    // write your implementation here
  }
  // Override GetBindInfo and the other IBindStatusCallback methods
  // by simply returning E_NOTIMPL, like this:
  STDMETHODIMP GetBindInfo(DWORD* /*grfBINDF*/, BINDINFO* /*pbindinfo*/)
  {
    return E_NOTIMPL;
  }

  // Provide the usual implementations for these IUnknown methods.
  STDMETHODIMP QueryInterface(REFIID riid, void** ppv);
  STDMETHODIMP_(ULONG) AddRef();
  STDMETHODIMP_(ULONG) Release();
};

Create an instance of that, get its IBindStatusCallback interface pointer, and pass it to the API function. Something like this:

CBindStatusCallback* obj = new CBindStatusCallback;
IBindStatusCallback* callback = NULL;
HResult hr = obj->QueryInterface(IID_IBindStatusCallback, &callback);
obj = NULL;
hr = URLDownloadToFile(..., callback);
callback->Release();
callback = NULL;

You'll probably want to pass some sort of information to the object's constructor so that it knows how to notify the rest of your program that the download has terminated. Until your program receives that notification, you can just let it sit in the usual idle state in its message pump.

Rob Kennedy
  • 161,384
  • 21
  • 275
  • 467
0

This might help.

Using Internet Explorer to download files for you

Rob
  • 76,700
  • 56
  • 158
  • 197