4

A tool I'm writing is responsible for downloading thousands of image files over a matter of many hours. Originally, using TIdHTTP, I would Get the file(s) into a TMemoryStream, and then save that to a file, so long as there were no exceptions. In order to improve speed, I changed the TMemoryStream to a TFileStream.

However, now if the resource was not found, or otherwise any sort of exception which results in no actual file, it still saves an empty file.

Completely understandable, since I simply create a file stream just prior to the download...

FileStream:= TFileStream.Create(FileName, fmCreate);
try
  Web.Get(AURL, FileStream);
finally
  FileStream.Free;
end;

I know I could simply delete the file if there was an exception. But it seems far too sloppy. I'm sure there's a more appropriate method of aborting such a situation.

How should I make this to not save a file if there was an exception, while not altering the performance (if at all possible)?

Jerry Dodge
  • 26,858
  • 31
  • 155
  • 327
  • 1
    You can write your own wrapper over `TFileStream` that will delay creating a file until there is some data to write to it. – EugeneK Oct 21 '17 at 02:25
  • 2
    I would create my own file stream class that would call `CreateFile` with `FILE_SHARE_DELETE` share mode and when an Indy exception is raised I'd call `DeleteFile` function (e.g. from that class' method) which marks the file for deletion and delete it when the last handle to the file is closed (which is when the stream would be released). – Victoria Oct 21 '17 at 03:07
  • 1
    I would create a `TFileStream` descendant with a boolean member, and set that member only if the download is successful, and then have the destructor delete the file if the boolean is not set. – Remy Lebeau Oct 21 '17 at 03:32
  • _I know I could simply delete the file if there was an exception. But it seems far too sloppy._ **It's not that sloppy at all**. Actually, that's the main point of exception handling: It's a hook to rollback state that was previously changed but is no longer appropriate as a result of the exception. (One could argue about the sloppiness of exception handling in general; but that leads to discussions about stateless and functional programming.) – Disillusioned Oct 21 '17 at 09:49

2 Answers2

11

How should I make this to not save a file if there was an exception, while not altering the performance (if at all possible)?

This isn't possible in general. Errors and failures can happen at any step if the way, including part way through the download. Once this point is understood, then you must accept that the file can be partially downloaded and then abandoned. At which point where do you store it?

The obvious choices are memory and file. You don't want to store to memory, which leaves to file.

This takes you back to your current solution.

I know I could simply delete the file if there was an exception.

This is the correct approach. There are a few variants on this. For instance you might download to a temporary file that is created with flags to arrange its deletion when closed. Only if the download completes do you then copy to the true destination. This is the approach that a browser takes. But the basic idea is to download to file and deal with any failure by tidying up.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
0

Instead of downloading the entire image in one go, you could consider using HTTP range requests if the server supports it. Then you could chunk the file into smaller parts, requesting the next part after the first finishes (or even requesting multiple parts at the same time to increase performance). If there is an exception then you can about the future requests, so they never start in the first place.

YouTube and a number of streaming media sites started doing this a while ago. It used to be if you started playing a video, then paused it, then it would eventually cache the entire video. Now it only caches a little ahead of the current position. This saves a ton of bandwidth because of the abandon rate for videos.

You could write the partial file to disk or keep it in memory.

Jim McKeeth
  • 38,225
  • 23
  • 120
  • 194
  • 1
    This doesn't really address the question that was asked. – David Heffernan Oct 23 '17 at 07:24
  • This *could* be a good answer, if it explained "*they never start in the first place*" a bit more. However, the tool does download from arbitrary URLs, so there is zero guarantee that the server supports it. – Jerry Dodge Dec 19 '17 at 02:14