2

I have a simple program that creates an OmniThread workerpool in the 'initialization' of a unit and destroys the same pool in the 'finalization' of that unit. This works fine as long as we do not use EurekaLog. If we include EurekaLog an access violation is raised during finalization of the application (after closing the app). This happens only once every 3 to 10 times the program ends; so it seems some sort of timing issue.

It seems that it all works fine, if we create the workerpool in the "normal" application flow (not in the initialization and finalization of a seperate unit).

The code of this unit is as follows:

unit Unit1;

interface

uses
  OtlParallel;

var
  _Worker: IOmniBackgroundWorker;

implementation

initialization
  _Worker := Parallel.BackgroundWorker.NumTasks(10)
    .Execute(
      procedure(const AWorkItem: IOmniWorkItem)
      begin
        //
      end
    );

finalization
  _Worker.Terminate(INFINITE);
  _Worker := nil;

end.

The main application is also simple:


uses
  {$IFDEF EurekaLog}
  EMemLeaks,
  EResLeaks,
  EDebugExports,
  EDebugJCL,
  EFixSafeCallException,
  EMapWin32,
  EAppVCL,
  EDialogWinAPIMSClassic,
  EDialogWinAPIEurekaLogDetailed,
  EDialogWinAPIStepsToReproduce,
  ExceptionLog7,
  {$ENDIF EurekaLog}
  Vcl.Forms,
  Unit1 in 'Unit1.pas';

{$R *.res}

begin
  Application.Initialize;
  Application.MainFormOnTaskbar := True;
  Application.Run;
end.

The callstack on the access violation is:

:5653e4c4 
OtlTaskControl.TOmniTask.Execute
OtlThreadPool.TOTPWorkerThread.ExecuteWorkItem($393A160)
OtlThreadPool.TOTPWorkerThread.Execute
System.Classes.ThreadProc($3976800)
EThreadsManager.NakedBeginThreadWrapper(???)
:76ee6359 KERNEL32.BaseThreadInitThunk + 0x19
:77628944 ntdll.RtlGetAppContainerNamedObjectPath + 0xe4
:77628914 ntdll.RtlGetAppContainerNamedObjectPath + 0xb4

I use the latest master checkout of OTL and EurekaLog version 7.9.1.4 update 1 hot-fix 4.

Is the way we create and destroy the workerpool correct? And, if so, is anybody familiar with issues in OTL/EurekaLog if they are used together?

Laurens
  • 325
  • 2
  • 12
  • 1
    Use the debugger to know where exactly the AV occurs and add the information to your question. A call stack could be useful. Also, add a minimal reproducible example. It is likely that while creating this minimal example, you'll find the issue. – fpiette Feb 15 '21 at 10:17
  • Hi, I have a minimal example and can send it you, but all of the code is already in the post above :-) I can also post a callstack here (of all 10+ threads if you like) but that only refers to internal OTL code and I'm not the maintainer of that code. – Laurens Feb 15 '21 at 13:03
  • Don't send a minimal example to me: edit your question and use that example instead of the code you showed. It is important that the example is REALLY MINIMAL and self contained. No one here will spend time writing code to reproduce your problem. YOU must provide that code (.pas and .dfm file). About the call stack: Only the call stack for the thread that triggers the AV. Once you get it, put a break point on the first line that is in your own code and restart. Examine if at that line everything is as expected. – fpiette Feb 15 '21 at 13:23
  • The above example is minimal :-) There is no DFM, only the DPR code (included above) and the code in Unit1 (included above). The code we depend on (EurekaLog and OTL) are not included but are necessary to reproduce it. The callstack sometimes refers to EurekaLog and sometimes to OTL. It seems that EurekaLog tries to shutdown, while some threads are still running. From my opinion, that could hint to a problem in OTL, because it should shutdown all threads after finalize. – Laurens Feb 16 '21 at 08:08
  • I lack the tools required to test myself :-( What does Eurekalog and OTL support say? – fpiette Feb 16 '21 at 08:33
  • EurekaLog says `Overall, this seems to be some sort of memory or synchronization issue in OTL, so I would suggest to take it to OTL devs.`, so I have contacted Primož Gabrijelčič and hope he has a clue. Another option for us is to use the Delphi TThreadPool from the Parallel Programming Library so we have more control over the threadpool. – Laurens Feb 16 '21 at 12:32

1 Answers1

1

It appears to be a flaw / not implemented in OTL:

destructor TOmniTaskControl.Destroy;
begin
  // TODO 1 -oPrimoz Gabrijelcic : ! if we are being scheduled, the thread pool must be notified that we are dying !

It looks like this code can have a task sheduled in the thread pool, which is still running.

Both TOmniTaskControl and TOmniTask share the same TOmniSharedTaskInfo instance. If TOmniTaskControl is deleted from finalization of OtlTaskControl, then any TOmniTask in still running thread pool will contain reference to already deleted TOmniSharedTaskInfo.

It is "not a problem" in app without EurekaLog, since memory of deleted object will be unchanged. Thus, access to already deleted object will be successful. However, adding EurekaLog means erasing disposed memory with a debug pattern. Thus, accessing already deleted object may fail.

Specifically, the code crashes when TOmniTask tries to access already deleted TOmniSharedTaskInfo:

procedure TOmniTask.InternalExecute(calledFromTerminate: boolean);
// ...
begin
  otCleanupLock.EnterWriteLock;
// ...
      finally
        otCleanupLock.EnterWriteLock;
        if assigned(otSharedInfo_ref.ChainTo) and
           (otSharedInfo_ref.ChainIgnoreErrors or (otExecutor_ref.ExitCode = EXIT_OK))
        then
          chainTo := otSharedInfo_ref.ChainTo; // - fails here inside System.@IntfCopy
        otSharedInfo_ref.ChainTo := nil;
// ...

The inaccessible :5653e4c4 location is just an address, which System.@IntfCopy tries to read when threating trashed fields from otSharedInfo_ref.

You can confirm that it is a "use after free" issue by disabling "Catch memory problems" option in EurekaLog (which overwrites object/interface VMT on disposal) and setting "When memory is released" option to "Do nothing". If you do that - the mentioned crash accessing already deleted otSharedInfo_ref / TOmniSharedTaskInfo inside TOmniTask.InternalExecute will no longer occur (indeed - because now the deleted otSharedInfo_ref will be unchanged, thus will be accessible).

Additionally, a new memory leak will now be found instead:

OtlSync.pas TOmniCriticalSection.Create
OtlSync.pas CreateOmniCriticalSection
OtlSync.pas TOmniCS.Initialize
OtlTaskControl.pas TOmniCS.Acquire
OtlTaskControl.pas TOmniTask.InternalExecute
OtlTaskControl.pas TOmniTask.Execute
OtlThreadPool.pas TOTPWorkerThread.ExecuteWorkItem
OtlThreadPool.pas TOTPWorkerThread.Execute
System.Classes.pas ThreadProc

This leak occurs at otSharedInfo_ref.MonitorLock.Acquire inside TOmniTask.InternalExecute. This happens near previosly mentioned chainTo := otSharedInfo_ref.ChainTo.

It is just another confirmation of "use after free" bug. That is because normally ostiMonitorLock: TOmniCriticalSection field of TOmniSharedTaskInfo would be finalized when the otSharedInfo_ref: TOmniSharedTaskInfo field is deleted as part of TOmniTaskControl's destruction.

However, if there is "use after free" bug - then the TOmniTaskControl is being deleted while there is TOmniTask alive still holding reference to the same TOmniSharedTaskInfo. This means that attempt to otSharedInfo_ref.MonitorLock.Acquire will re-initialize ostiMonitorLock: TOmniCriticalSection with a brand new critical section. And since the TOmniTaskControl and its TOmniSharedTaskInfo is already gone - there will be no code to cleanup this newly created critical section. Indeed, you just created a new critical section as a field inside already deleted object (TOmniSharedTaskInfo)!

Alex
  • 5,477
  • 2
  • 36
  • 56