I have a system that loads some text files that are zipped into a ".log" file and parse then into informational classes using multiple threads that each deals with a different file and adds the parsed objects to a list. The file is loaded using TStringList, since it was the fastest method that I tested.
The number of text files is variable but normally I have to deal with something between 5 to 8 files ranging from 50Mb to 120Mb in one incursion.
My problem: The user can load the .log files as many times they desire, and after some of those processes I receive an EOutOfMemory exception when trying to use TStringList.LoadFromFile. Of course, the first thing that comes to mind to anyone that has ever used a StringList is that you should not use it when dealing with big textfiles, but this exception happens randomly and after the process has already been completed successfully at least once (the objects are destroyed before the start of a new parsing so the memory is retrieved correctly apart from some minor leaks)
I tried using textile and TStreamReader but it's not as fast as TStringList and the duration of the process is the greatest concern with this feature.
I'm using 10.1 Berlin, the parse process is a simple iteration trough the list of varied length lines and construction of objects based on the line info.
Essentially, my question is, what is causing this and how can i fix it. I may use other ways to load the file and read its contents but it must be as fast (or better) as the TStringList method.
Loading thread execute code:
TThreadFactory= class(TThread)
protected
// Class that holds the list of Commands already parsed, is owned outside of the thread
_logFile: TLogFile;
_criticalSection: TCriticalSection;
_error: string;
procedure Execute; override;
destructor Destroy; override;
public
constructor Create(AFile: TLogFile; ASection: TCriticalSection); overload;
property Error: string read _error;
end;
implementation
{ TThreadFactory}
constructor TThreadFactory.Create(AFile: TLogFile; ASection: TCriticalSection);
begin
inherited Create(True);
_logFile := AFile;
_criticalSection := ASection;
end;
procedure TThreadFactory.Execute;
var
tmpLogFile: TStringList;
tmpConvertedList: TList<TLogCommand>;
tmpCommand: TLogCommand;
tmpLine: string;
i: Integer;
begin
try
try
tmpConvertedList:= TList<TLogCommand>.Create;
if (_path <> '') and not(Terminated) then
begin
try
logFile:= TStringList.Create;
logFile.LoadFromFile(tmpCaminho);
for tmpLine in logFile do
begin
if Terminated then
Break;
if (tmpLine <> '') then
begin
// the logic here was simplified that's just that
tmpConvertedList.Add(TLogCommand.Create(tmpLine));
end;
end;
finally
logFile.Free;
end;
end;
_cricticalSection.Acquire;
_logFile.AddCommands(tmpConvertedList);
finally
_cricticalSection.Release;
FreeAndNil(tmpConvertedList);
end;
Except
on e: Exception do
_error := e.Message;
end;
end;
end.
Added: Thank you for all your feedback. I will address some issues that were discussed but I failed to mention in my initial question.
The .log file has multiple instances of .txt files inside but it can also have multiple .log files, each file represents a day worth of logging or a period selected by the user, since the decompression takes a lot of time a thread is started every time a .txt is found so I can start parsing immediately, this has shortened the noticeable waiting time for the user
The "minor leaks" are not shown by ReportMemoryLeaksOnShutdown and other methods like TStreamReader avoid this issue
The list of commands is held by TLogFile. There is only one instance of this class at any time and is destroyed whenever the user wants to load a .log file. All threads add commands to the same object, that's the reason for the critical section.
Can't detail the parse process since it would disclose some sensible information, but it's a simple information gathering from the string and the TCommand
Since the beginning I was aware of fragmentation but I never found concrete proof that TStringList causes the fragmentation only by loading multiple times, if this can be confirmed I would be very glad
Thank you for you attention. I ended up using an external library that was capable of reading lines and loading files with the same speed as TStringList
without the need to load the whole file into memory