Johan has given you a good answer about Boyer-Moore searching. BM is fine if
your are content to use it as a "black box", but if you want to understand what's going on,
there is a bit of a gulf between the complexity of your own code and a BM implementation.
You might find it helpful to explore searching that's more efficient than your own code
but not so complex as BM. There is one ultra-simple way to do what you want without
getting invoved with pointers, PChars, etc.
Let's leave aside for a moment the fact that you want to work with a TMemoryStream, and
consider finding the number of occurrences of a string SubStr
in another string Target
.
For efficiency, things you want to avoid are a) repeatedly scanning the same characters
over and over and b) copying one or both strings.
Since D7, Delphi has included a PosEx
function:
function PosEx(const SubStr, S: string; Offset: Cardinal = 1): Integer;
Description
PosEx returns the index of SubStr in S, beginning the search at Offset. If Offset is 1 (default), PosEx is equivalent to Pos.
PosEx returns 0 if SubStr is not found, if Offset is greater than the length of S, or if Offset is less than 1.
So what you can do is repeatedly call PosEx
, starting with Offset
= 1, and each time it
finds SubStr
in Target
you increment Offset
to skip over it, like this (in a console application):
function ContainsCount(const SubStr, Target : String) : Integer;
var
i : Integer;
begin
Result := 0;
i := 1;
repeat
i := PosEx(SubStr, Target, i);
if i > 0 then begin
Inc(Result);
i := i + Length(SubStr);
end;
until i <= 0;
end;
var
Count : Integer;
Target : String;
begin
Target := 'aa b ca';
Count := ContainsCount('a', Target);
writeln(Count);
readln;
end.
The fact that PosEx
and ContainsCount
both pass SubStr
and Target
as
consts meants that no string copying is involved, and it should be obvious
that ContainsCount
never scans the same characters more that once.
Once you've satisfied yourself that this works, you might care to trace
into PosEx
to see how it does its stuff.
You can do something which works in a similar way on PChars using the RTL functions StrPos
/AnsiStrPos
To convert your memory stream to a string, you could use this code from
Rob Kennedy's answer to this q Converting TMemoryStream to 'String' in Delphi 2009
function MemoryStreamToString(M: TMemoryStream): string;
begin
SetString(Result, PChar(M.Memory), M.Size div SizeOf(Char));
end;
(Note what he says about the alternative version later in his answer)
Btw, if you look through the VCL + RTL code, you'll see that quite a lot of the string-parsing and processing code (e.g. in TParser, TStringList, TExpressionParser) all does its work with PChars. There's a reason for that of course, because it minimizes character copying and means that most scanning operations can be done by changing pointer (PChar) values.