2

I have a Delphi 6 application that sends bitmaps to a DirectShow DLL in real-time, 25 frames a second. The DirectShow DLL is my code too and is also written in Delphi 6 using the DSPACK DirectShow component suite. I have a simple block of code that goes through each pixel in the bitmap modifying the brightness and contrast of the image, if a certain flag is set, otherwise the bitmap is pushed out the DirectShow DLL unmodified (push source video filter). The code used to be in the main application and then I just moved it into the DirectShow DLL. When it was in the main application it ran fine. I could see the changes in the bitmap as expected. However, now that the code resides in the DirectShow DLL it has the following problems:

  1. When the code block below is active the DirectShow DLL is really slow. I have a quad core i5 and it's really slow. I can also see a big spike in the CPU consumption. In contrast, the very same code running in the main application ran fine on an old single core P4. It did hit the CPU noticeably on that old machine but the video was smooth and there were no problems. The images are only 352 x 288 pixels in size.

  2. I don't see the expected changes to the visible bitmap. I can trace the code in the DirectShow DLL and see the numerical values of each pixel properly altered by the code, but the viewable image in the Graph Edit ActiveMovie window looks completely unchanged.

  3. If I deactivate the code, which I can do in real-time, the ActiveMovie window shows video that is as smooth as glass, perfectly rendered with the CPU barely touched. If I reactivate the code the video is now really choppy, probably showing only 1 to 2 frames a second with a long delay before the first frame is shown, and the CPU spikes. Not completely, but a lot more than I would expect.

I tried compiling the DirectShow DLL with everything on including range checking, overflow checking, etc. and there were no warnings or errors during run-time. I then tried compiling for fastest speed and it still had the exact same problems listed above. Something is really wrong and I can't figure out what. Note, I do indeed lock the canvas before modifying the bitmap and unlock it after I'm done. If it weren't for the "everything on" compilation run I noted above I'd say it felt like an FPU Exception was being raised and silently swallowed with every pixel computation, but as I said, no errors or Exceptions are occurring.

UPDATE: I am putting this here so that the solution, which is embedded in one of Roman R's comment, is plainly visible. The problem that I was not setting the PixelFormat property to pf24Bit before accessing the ScanLine property. As Roman suggested, not doing this must make the TBitmap code create a temporary copy of the bitmap. As soon as I added the line of code below the problems went away, both that of changes not being visible and the soft page faults. It's an insidious problem because the only object that is affected is the pointer you use to access the ScanLine property, since (assumption) it contains a pointer to a temporary copy of the bitmap. That's must be why the subsequent TextOut() call still worked since it worked on the original copy of the bitmap.

clip.PixelFormat := pf24bit; // The missing code line that fixes the problem.

Here's the code block I've been referring to:

function IntToByte(i: Integer): Byte;
begin
 if i > 255 then
   Result := 255
 else if i < 0 then
   Result := 0
 else
   Result := i;
end;

// ---------------------------------------------------------------

procedure brightnessTurboBoost(var clip: TBitmap; rangeExpansionPowerOf2: integer; shiftValue: Byte);
var
   p0: PByte;
   x,y: Integer;
begin
   if (rangeExpansionPowerOf2 = 0) and (shiftValue = 0) then
       exit; // These parameter settings will not change the pixel values.

   for y := 0 to clip.Height-1 do
   begin
       p0 := clip.scanline[y];

       // Can't just do the whole buffer as a big block of bytes since the
       //  individual scan lines may be padded for CPU alignment.
       for x := 0 to (clip.Width - 1) * 3 do
       begin
           if rangeExpansionPowerOf2 >= 1 then
               p0^ := IntToByte((p0^ shl rangeExpansionPowerOf2) + shiftValue)
           else
               p0^ := IntToByte(p0^ + shiftValue);

           Inc(p0);
       end;
   end;
end;
Robert Oschler
  • 14,153
  • 18
  • 94
  • 227
  • 1
    I can suggest several minor performance enhancements to this code but that would not account for the fact that your code ran fine on a P4. I would suggest getting hold of a sampling profiler and see what it says about where the excess time is being spent. – 500 - Internal Server Error Jan 05 '12 at 23:36
  • 1
    FPU exception? I dont see any reasons to employ FPU at all. Also, you dont need to clamp negative values, entire expression is unsigned. Also, even silenced exceptions should be visible to debugger. – OnTheFly Jan 05 '12 at 23:39
  • 1
    @500-InternalServerError. I'd still like to hear the suggestions if you would, thanks. – Robert Oschler Jan 05 '12 at 23:51
  • @user539484 - I have had weird things happen with FPU mask differences between Delphi and C/C++ DLLs. I have no idea if that is what is happening now, I'm just grasping at straws. But for the performance to be so bad, it feels almost as if some hidden sub-routine is firing every time I access/modify a pixel, and that would kill the speed drastically and might also account for the image being unchanged despite seeing the pixel values changing in the debugger. CPU ring protection faults, FPU silent exception, meteor showers, thread thrashing, I don't know. As I said, I'm just hypothesizing. – Robert Oschler Jan 05 '12 at 23:55
  • 1
    In addition to u5's point about the redundant < 0 test, it's probably better to expand the > 255 test in line. Also, hoist the >= 1 test out of the loop (have two loops - one with the shl and one without). clip.Width can also be cached in a local, but I don't know how much impact that would have. – 500 - Internal Server Error Jan 06 '12 at 01:03
  • Feel free to post a screen shot from the CPU window if you don't read ASM and want us to verify that there's no floating point math in this routine. – 500 - Internal Server Error Jan 06 '12 at 01:04

1 Answers1

3

There are a few things to say about this code snippet.

  1. First of all, you are using Scanline property of TBitmap class. I have not been dealign with Delphi for many years, so I might be wrong about this but I am under impression that Scanline is not actually a thin accessor, is it? It might be internally hiding things which can dramatically affect performance, such as "if he wants to access the bits of the image, then we have to first convert it to DIB before returning pointers". So a thing looking so simple might appear to be a killer.

  2. "if rangeExpansionPowerOf2 >= 1 then" in the inner loop body? You don't really want to compare this all the way. Either make two separate functions or duplicate the whole loop without in two version for zero and non-zero rangeExpansionPowerOf2 and do this if only once.

  3. "for ... to (clip.Width - 1) * 3 do" I am not really sure that Delphi optimizes the upper boundary evaluation to make it only once. You might be doing those multiplication thrice for every pixel, while you could do it only once the whole image.

  4. For top perofrmance IntToByte is definitely implemented in MMX to avoid ifs and process multiple bytes at once.

Still as you say that images are only 352x288, I would suspect that #1 is ruining the performance.

Roman R.
  • 68,205
  • 6
  • 94
  • 158
  • I traced the code and I did see the TBitmap class calling FreeImage with each Scanline access. However, that leaves the symptom of the bitmap modifications not being seen in the rendered image as an open question. Thanks for the optimization tips. – Robert Oschler Jan 06 '12 at 18:55
  • 1
    This exactly what I mean, if `FreeImage` is there - one can forget about performance. And this also answers your other topic about not seeing changes: you might be modifying a temporary copy. Note that code snippets instruct you to assign `PixelFormat` property before accessing `Scanline`, you are not doing this? This might be an immediate cause but in any event you have to make it the way that `Scaline` accessor does not do any conversions or allocations. Maybe just once being called the first time. – Roman R. Jan 06 '12 at 19:03
  • Thank you, a lot. Not assigning PixelFormat was the problem. Now I can see the changes in the image and the soft page faults are gone. However, it was not due to FreeImage. There is no way to avoid the FreeImage call because the property getter TBitmap.GetScanLine() immediately calls Changing() and that method immediately calls FreeImage, unconditionally. There was something about not setting the Pixelformat that indeed results in a temporary copy even more strangely only exists during the ScanLine access operation since a subsequent TextOut() still works. Thanks again. – Robert Oschler Jan 06 '12 at 19:48