5

I am creating a Delphi application to download files from the Internet and if the server supports range requesting it will be multi threaded. The progress is also relayed back to the GUI.

The current software model uses TThread components. The GUI calls a TDownloadThread which then spawns the TDownloadPartThreads - these are the threads which actually do the downloading over 'WinHttp'.

My problem: The CPU is used up, even for one download where there are only 4 threads downloading.

My Supposed Causes:

  1. I am writing to the destination file every 8192 bytes, and was wondering if I should buffer it before writing in one block?
  2. The thread communication is done via Synchronize(MainForm.UpdateProgress(Downloaded, TotalSize)) which I have heard is AWFUL to do, maybe I should share an object between the threads so I can access this using a timer on the main form, to update progress?

My Solutions

  1. Stagger the file writing and only write every x bytes.

  2. Update the TThread components to use OmniThreadLibrary and send the data back to the main form somehow. Each TDownloadPart thread would then become an IOmniWorker and send back its progress by sharing an object with the main form. The main form would then use a timer to update its progress, like: ProgressBar1.Position := sharedDataObject.Progress;

Hopefully someone can point me in the right direction!

Hzmy
  • 769
  • 7
  • 16

4 Answers4

2

I would use shared object to update state - just like you suggest in your second solution. If you only share an 8-byte (4 is not enough!) file size and if you make sure that address of each shared location is 8-aligned you can use interlocked instructions to modify this shared state and you won't event need locking.

The simplest way to maintain shared state would be TGp8AlignedInt64 record from the GpStuff unit, which would work equally well with OmniThreadLibrary- or TThread-based solution.

TGp8AlignedInt64 = record
  function  Add(value: int64): int64; inline;
  function  Addr: PInt64; inline;
  function  CAS(oldValue, newValue: int64): boolean;
  function  Decrement: int64; overload; inline;
  function  Decrement(value: int64): int64; overload; inline;
  function  Increment: int64; overload; inline;
  function  Increment(value: int64): int64; overload; inline;
  function  Subtract(value: int64): int64; inline;
  property Value: int64 read GetValue write SetValue;
end; 

All operations on this record are thread-safe, so you can safely do .Add in the worker thread and call .Value from the timer event in the main form at the same time.

gabr
  • 26,580
  • 9
  • 75
  • 141
  • So, to clarify, I would just have to declare `sharedData : TGp8AlignedInt64;` in a common unit accessible to both the worker and main form? What if I wanted to share two Int64 variables per thread, would I just use two of the above records? – Hzmy Nov 14 '11 at 20:08
  • Yes. For example, you can use a list/array of those records, one for each worker. – gabr Nov 14 '11 at 20:18
1

Yup - it all sorta depends upon the frequency of the GUI updates. TThread.Synchronize is the worst possible option for main thread updates - the download threads are forced to wait until the GUI update is performed before they can continue. The next on the list is PostMessaging updates and this fills the middle ground - the download threads no longer have to wait but, if the GUI cannot keep up with the posted messages, it will freeze anyway long before the 10000 message-limit for WMQ is reached. Where there are many threads and rapid updates, a timer polling some suitable notification object/list/array/whatever is a sensible solution.

Martin James
  • 24,453
  • 3
  • 36
  • 60
1

My advice is not to "guess" what is slow, but use a profiler and measure where the CPU is burnt. I suspect you may be surprised.

WinHTTP is not slow, and does not use a lot of CPU by itself. It is much faster than WinINet, and is working very well (at least with the flat C API - or are you using the COM interface?). Perhaps there is something wrong in your code.

About your questions:

  1. Writing chunk size of 8192 bytes does make sense, and won't be much faster (when compared to the HTTP stream downloading speed) if you use a bigger buffer. Windows file system usually write data on disk by 4 KB, and will do the buffering all by itself. Just try to make it bigger (e.g. 65536) but I don't think change will be noticeable.

  2. Synchronize is not so awful. What you can do is to call it only if you change for some percents (e.g. every 5% or 10%) instead of each time. You can do that inside the downloading thread, just by adding a private variable containing the latest notified size.

Another possibility could be to use some read-only properties (DownloadedSize + TotalSize: Int64) to the thread class, then update their content during the downloading. Then use a TTimer - or create a custom message (WM_USER+...), then use PostMessage() in the downloading thread - in the main GUI thread to refresh the progress bar if needed, for each thread. This is safe to read some property from the main thread.

Arnaud Bouchez
  • 42,305
  • 3
  • 71
  • 159
  • Thanks! In regards to your last point, would I just add `property ReadSize: Int64 read GetSize;` into my `TThread` object? Then poll that value with a `TTimer` on my main form? All this without any critical sections? – Hzmy Nov 14 '11 at 20:17
  • @Hzmy Yes you can do that. Since TotalSize won't change, and DownloadedSize will increase, there won't be any issue (if you use Cardinal values, you can consider reading will be atomic). It may have some border-side effect, if you use an Int64 kind of variable (if you download more than 4 GB of data): in this case, you may have a smaller number than expected (lower DWORD set before higher DWORD), but in this case, if you use a 5% increase refresh rule for instance, there won't be any problem. To be sure, you can use a critical section, but I do not think it is mandatory for a progress bar. – Arnaud Bouchez Nov 15 '11 at 08:19
1

Synchronize() may slow your program down because you'll have your additional threads waiting on the main thread, however, it is not a burden on the CPU. Your computer won't be doing any more work than it was already, but the user may notice it if it affects the responsiveness to the GUI.

Writing to the disk can be IO intensive, but again, not a burden on the CPU. Sometimes your antivirus programs will increase CPU usage on writes and reads, and an encrypted file system will introduce a small bit of additional CPU usage.

Since neither of the two issues you mentioned are, by themselves, affecting the CPU usage, changing them will not necessarily alleviate the problem.

Perhaps you are updating the GUI too often, or the code updating the GUI is too CPU intensive? What happens if you stop making call backs to update the GUI? Does it reduce the CPU usage?

Perhaps the code that is writing to the disk is processing it in some way? How are you buffering the data?

Definitely find out what the real cause of the high CPU usage is.

If you find that your bottleneck is anything other than the Internet bandwidth, most likely, there is an issue.

Marcus Adams
  • 53,009
  • 9
  • 91
  • 143
  • Just found out my CPU bottleneck was because I had the code `while not(self.Terminated) do begin end;` inside of my `TDownloadThread` which waited for the main form to close it so I could effectively have a cancellation mechanism inside my program. I fixed this by putting `Sleep(300);` inside of the `while` loop. Thanks for the answers, I am still working on my GUI updating though. – Hzmy Nov 14 '11 at 20:11
  • I prefer to send messages rather than poll statuses. You can force the download thread to create a message queue with PeekMessage, then from your main VCL thread, store the process ID for each thread, and use PostThreadMessage to notify the thread. – Marcus Adams Nov 14 '11 at 20:33
  • If you are only reading/writing to shared object/s that outlive the forms and the threads, (eg. allocated in an initialization section), you don't need any shutdown cancellation mechanism at all, especially not a CPU loop or sleep loop. – Martin James Nov 15 '11 at 08:37