Can code running in a background thread be faster than in the main VCL thread in Delphi?

Question

If anybody has had a lot of experience timing code running on the main VCL thread vs a background thread, I'd like to get an opinion. I have some code that does some heavy string processing running in my Delphi 6 application on the main thread. Each time I run an operation, the time for each operation hovers around 50 ms on a single thread on my i5 Quad core. What makes me really suspicious is that the same code running on an old Pentium 4 that I have, shows the same time for the operation when usually I see code running about 4 times slower on the Pentium 4 than the Quad Core. I am beginning to wonder if the code might be consuming significantly less time than 50 ms but that there's something about the main VCL thread, perhaps Windows message handling or executing Windows API calls, that is creating an artificial "floor" for the operation. Note, an operation is triggered by an incoming request on a socket if that matters, but the time measurement does not take place until the data is fully received.

Before I undertake the work of moving all the code on to a background thread for testing, I am wondering if anyone has any general knowledge in this area? What have your experiences been with code running on and off the main VCL thread? Note, the timing measurements are being done when there is absolutely no user triggered activity going on during the tests.

I'm also wondering if raising the priority of the thread to just below real-time would do any good. I've never seen much improvement in my run times when experimenting with those flags.

-- roschler

what are you using to measure the time for each operation? – hatchet - done with SOverflow Jul 19 '11 at 04:05 — hatchet - done with SOverflow, Jul 19 '11 at 04:05

score 12 · Answer 1 · edited Jun 22 '17 at 07:19

Given all threads have the same priority, as they normally do, there can't be a difference, for the following reasons. If you're seeing a difference, re-evaluate the code (make sure you run the same thing in both VCL and background threads) and make sure you time it properly:

The compiler generates the exact same code, it doesn't care if the code is going to run in the main thread or in a background thread. In fact you can put the whole code in a procedure and call that from both your worker thread's Execute() and from the main VCL thread.
For the CPU all cores, and all threads, are equal. Unless it's actually a Hyper Threading CPU, where not all cores are real, but then see the next bullet.
Even if not all CPU cores are equal, your thread will very unlikely run on the same core, the operating system is free to move it around at will (and does actually schedule your thread to run on different cores at different times).
Messaging overhead doesn't matter for the main VCL thread, because unless you're calling Application.ProcessMessages() manually, the message pump is simply stopped while your procedure does it's work. The message pump is passive, your thread needs to request messages from the queue, but since the thread is busy doing your work, it's not requesting any messages so no overhead there.

There's just one place where threads are not equal, and this can change the perceived speed of execution: It's the operating system that schedules threads to execution units (cores), and for the operating system threads have different priorities. You can tell the OS a certain thread needs to be treated differently using the SetThreadPriority() API (which is used by the TThread.Priority property).

score 10 · Accepted Answer · edited May 23 '17 at 10:24

10

Without simple source code to reproduce the issue, and how you are timing your threads, it will be difficult to understand what occurs in your software.

Sounds definitively like either:

An Architecture issue - how are your threads defined?
A measurement issue - how are you timing your threads?
A typical scaling issue of both the memory manager and the RTL string-related implementation.

About the latest point, consider this:

The current memory manager (FastMM4) is not scaling well on multi-core CPU; try with a per-thread memory manager, like our experimental SynScaleMM - note e.g. that the Free Pascal Compiler team has written a new scaling MM from scratch recently, to avoid such issue;
Try changing the string process implementation to avoid memory allocation (use static buffers), and string reference-counting (every string reference counting access produces a LOCK DEC/INC which do not scale so well on multi-code CPU - use per-thread char-level process, using e.g. PChar on static buffers instead of string).

I'm sure that without string operations, you'll find that all threads are equivalent.

In short: neither the current Delphi MM, neither the current string implementation scales well on multi-core CPU. You just found out a known issue of the current RTL. Read this SO question.

edited May 23 '17 at 10:24

Community

1
1

answered Jul 19 '11 at 06:08

Arnaud Bouchez

42,305
3
71
159

Doesn't the scaling issue affect all threads equally? ie: if operations are slow, they're equally slow across all threads including the main VCL? – Cosmin Prund Jul 19 '11 at 06:49
@ A Bouchez - but would the FastMM4/multi-core issue come into play with a strictly single threaded application? What attribute of FastMM4's memory management isn't scaling well on multi-core? – Robert Oschler Jul 19 '11 at 09:00
@ A Bouchez - I read your blog posts on Delphi's LOCK and other string management problems so I can now how they would hurt multi-core performance, but how would that be related to a strictly single threaded app? – Robert Oschler Jul 19 '11 at 09:52
@Cosmin From the FastMM4 point of view, there is no difference between threads, AFAIK. The sleep() call may make a difference. Could try to define NeverSleepOnThreadContention conditional. – Arnaud Bouchez Jul 19 '11 at 11:45
@Robert If your main VCL thread don't do nothing but wait for messages, there is indeed to reason to have a string/FastMM4 contention issue. See my first two other points, in this case. We'll need some source code to reproduce it, unless we are all speaking theoricaly. – Arnaud Bouchez Jul 19 '11 at 11:47
@A Bouchez. I am doing timing simply by recording the StartTime before the operation begins and calculating the time delta after it completees. What about the VCL thread waiting for messages could cause a string/FastMM4 contention issue? Since no processMessages() calls are happening inside my operation, the socket doesn't get a chance to cycle it's processMessages loop until I'm done. Also, although I usually do include FastMM4 in my final releases, I have not yet included in the project I'm talking about here. In Delphi 6 you have to manually include it. – Robert Oschler Jul 19 '11 at 13:00
A.Bouchez's tip about const string parameters might be a quick improvement for the OP, if he isn't already aware of it (click the "this SO question" link above) - – Warren P Jul 19 '11 at 13:08
@Robert The VCL wait for message just call Windows API, and don't use the MM nor string. It's a fairly low-consumption process (otherwise, the whole Windows system will not be able to run). But I don't get what is this "socket" you are talking about? – Arnaud Bouchez Jul 20 '11 at 08:27
@A.Bouchez - re: "socket". Just mentioned it for context. The data that the operation processes is delivered via the socket. The socket sits in a wait-for-messages loop until data arrives but as I said, my code is "post" that loop so it should not interfere. However, I try to provide as much detail as possible when posting because sometimes somebody points out some nasty idiosyncrasy in a particular component, library, or module, like your excellent blog post on LOCK calls in the Delphi string libraries, that I would have no idea would be a problem. – Robert Oschler Jul 20 '11 at 08:42

score 6 · Answer 3 · edited Jul 19 '11 at 06:09

6

When your code has control of the VCL thread, for instance if it is in one method and doesn't call out to any VCL controls or call Application.ProcessMessages, then the run time will not be affected just because it's in the main VCL thread.

There is no overhead, since you "own" the whole processing power of the thread when you are in your own code.

I would suggest that you use a profiling tool to find where the actual bottleneck is.

edited Jul 19 '11 at 06:09

David Heffernan

601,492
42
1,072
1,490

answered Jul 19 '11 at 05:20

Nat

5,414
26
38

Stack Overflow policy is that you don't sign your posts because they already come with your name and mugshot. I edited it to that effect. – David Heffernan Jul 19 '11 at 08:18
I used an old version of AQTime when I used Delphi 6, but you have other options. http://www.torry.net/pages.php?id=1525 – Warren P Jul 20 '11 at 23:32

Warren P · Answer 4 · 2011-07-19T13:00:36.307

Performance can't be assessed statically. For that you need to get AQTime, or some other performance profiler for Delphi. I use AQtime, and I love it, but I'm aware it's considered expensive.

Your code will not magically get faster just because you moved it to a background thread. If anything, your all-inclusive-time until you see results in your UI might get a little slower, if you have to send a lot of data from the background thread to the foreground thread via some synchronization mechanisms.

If however you could execute parts of your algorithm in parallel, that is, split your work so that you have 2 or more worker threads processing your data, and you have a quad core processor, then your total time to do a fixed load of work, could decrease. That doesn't mean the code would run any faster, but depending on a lot of factors, you might achieve a slight benefit from multithreading, up to the number of cores in your computer. It's never ever going to be a 2x performance boost, to use two threads instead of one, but you might get 20%-40% better performance, in your more-than-one-threaded parallel solutions, depending on how scalable your heap is under multithreaded loads, and how IO/memory/cache bound your workload is.

As for raising thread priorities, generally all you will do there is upset the delicate balance of your Windows system's performance. By raising the priorities you will achieve (sometimes) a nominal, but unrepeatable and non-guaranteeable increase in performance. Depending on the other things you do in your code, and your data sources, playing with priorities of threads can introduce subtle problems. See Dining Philosophers problem for more.

Your best bet for optimizing the speed of string operations is to first test it and find out exactly where it is using most of its time. Is it heap operations? Memory Copy and move operations? Without a profiler, even with advice from other people, you will still be comitting a cardinal sin of programming; premature optimization. Be results oriented. Be science based. Measure. Understand. Then decide.

Having said that, I've seen a lot of horrible code in my time, and there is one killer thing that people do that totally kills their threaded app performance; Using TThread.Synchronize too much.

Here's a pathological (Extreme) case, that sadly, occurs in the wild fairly frequently:

   procedure TMyThread.Execute;
   begin
       while not Terminated do 
         Synchronize(DoWork);
   end;

The problem here is that 100% of the work is really done in the foreground, other than the "if terminated" check, which executes in the thread context. To make the above code even worse, add a non-interruptible sleep.

For fast background thread code, use Synchronize sparingly or not at all, and make sure the code it calls is simple and executes quickly, or better yet, use TThread.Queue or PostMessage if you could really live with queueing main thread activity.

P - thanks for the AQTime tip. Hopefully they do still fully support Delphi 6. — Robert Oschler, Jul 19 '11 at 12:55
@Warren - "Be science based. Measure. Understand. Then decide." - +1 — Vector, Jul 19 '11 at 17:27

Vector · Answer 5 · 2011-07-19T17:21:04.430

1

Are you asking if a background thread would be faster? If your background thread would run the same code as the main thread and there's nothing else going on in the main thread, you don't stand to gain anything with a background thread. Threads should be used to split and distribute processing loads that would otherwise contend with one another and/or block one another when running in the main thread. Since you seem to be dealing with a case where your main thread is otherwise idle, simply spawning a thread to run slow code will not help.

Threads aren't magic, they can't speed up slow code or eliminate processing bottlenecks in a particular segment not related to contention on the main thread. Make sure your code isn't doing something you don't know about and that your timing methodology is correct.

My first hunch would be that your interaction with the socket is affecting your timing in a way you haven't detected... (I know you said you're sure that's not involved - but maybe check again...)

edited Jul 19 '11 at 17:21

answered Jul 19 '11 at 05:54

Vector

10,879
12
61
101

Moving the big, long processing to the background threads pays off even if the main VCL thread is idle: It keeps the VCL responsive. If I'm going to wait 2 minutes for something to finish, I'd rather see a responsive window with a progress bar, not a non-responding window. – Cosmin Prund Jul 19 '11 at 06:01
@Mikey. The reason I pointed out my use of sockets is because I am worried it may play a part, I didn't mean to imply that I was sure it wasn't, just pointed out that I don't start timing until after the operation query has been received from the socket. The socket library I use sits in a custom process messages loop. However once my code is hit, it does not call process messages and since it is single threaded I'm not sure how the socket library's message loop could affect things. – Robert Oschler Jul 19 '11 at 09:06
@Cosmin - LOL - that's what I said: "and there's nothing else going on in the main thread.... distribute processing loads that would otherwise contend with one another and/or block one another when running in the main thread" – Vector Jul 19 '11 at 14:55
@Robert - regardless, if I were in your position: I KNEW the code should run faster with a better processor-then I'd look at my timing methodology, with particular attention to socket interactions, which can be tricky, as I'm sure you know. **BUT** - have your tried isolating just the processor intensive string manipulation code in a test environment free from any other possible side effects, and timed it with the two different CPU's? You have evidence that code should run faster with a different CPU and logic dictates such, but have you PROVED IT? Perhaps you're 'jumping to confusions' :-) – Vector Jul 19 '11 at 15:06
@Mikey. Not yet but that's definitely a good idea for a test. – Robert Oschler Jul 19 '11 at 16:29
@Robert - we're trying to be 'computer scientists'... apply the scientific method - PROVE IT with empirical evidence - that's the idea behind test driven development - prove your code works the way you think it does, don't just assume it. – Vector Jul 19 '11 at 17:24
One of the great aspects of the scientific method is that you should prove it both positively and negatively. Put this code in, test, take that code out, repeat exactly the same test with every other condition held invariant. Repeat tests more than once, also. Once you have both tests, and do every test in positive/negative pairs, with replicates, you have a much better idea of what is what. I learned that technique from an analytical chemistry book, and it seems to work great for anything you want to study in the real world. – Warren P Jul 20 '11 at 23:03
@warren - good advice - I try to follow this practice for anything that's not absolutely trivial - time permitting. Also important to think about false positives that can fall through the cracks in normal unit testing. Sometimes I find myself writing 6 or 8 tests for a method to test for different permutations that could create false positives or false negatives, etc. – Vector Jul 20 '11 at 23:21

Can code running in a background thread be faster than in the main VCL thread in Delphi?

5 Answers5