Delphi: What are faster pure Pascal approachs to find the position of a character in a Unicode string?

Question

Background

Added Later

I have made a pure Pascal function to find the position of a character in a Unicode string as follows:

function CharPosEx(const chChr: Char; const sStr: string;
    const iOffset: Integer=1): Integer;
var
  PStr   : PChar;
  PRunIdx: PChar;
  PEndIdx: PChar;
  iLenStr: Integer;

begin
  Result := 0;
  iLenStr := Length(sStr);
  if (iLenStr = 0) or (iOffset <= 0) or (iOffset > iLenStr) then Exit;

  PStr := Pointer(sStr);
  PEndIdx := @PStr[iLenStr - 1];
  PRunIdx := @PStr[iOffset - 1];

  repeat
    if PRunIdx^ = chChr then begin
      Result := PRunIdx - PStr + 1;
      Exit;
    end;
    Inc(PRunIdx);
  until PRunIdx > PEndIdx;
end;

I decide to not use the built-in StrUtils.PosEx() because I want to create a UTF16_CharPosEx function based on an optimized pure Pascal function of CharPosEx. I'm trying to find a faster generic solution like the pure Pascal approachs of the Fastcode Project.

The Original Statements

According to the accepted answer to the question, Delphi: fast Pos with 64-bit, the fastest pure Pascal function to find the position of a substring in a string is PosEx_Sha_Pas_2() of the Fastcode Project.

For the fastest pure Pascal function to find the position of a character in a string, I noticed that the Fastcode Project has CharPos(), CharPosIEx(), and CharPosEY() for a left-to-right matching, as well as CharPosRev() for a right-to-left matching.

However, the problem is that all Fastcode functions were developed before Delphi 2009, which was the first Delphi release that supports Unicode.

I'm interested in CharPos(), and CharPosEY(). I want to re-benchmark them because there are some optimization techniques that are useless nowadays, such as loop unrolling technique that was occasionally implemented in Fastcode functions.

However, I cannot recompile the benchmark project for each of the CharPos family challenges because I have been using Delphi XE3 here, therefore I cannot conclude which one is the fastest.

Questions

Anyone here know or can conlude which one is the fastest pure Pascal implementations for each of the mentioned Fastcode challenges, especially for CharPos() and CharPosEY()?

Other approaches out of the Fastcode Project solution are welcome.

Notes

The Unicode string term I used here refers to a string whose the type is UnicodeString regardless its encoding scheme.
If encoding scheme matters, what I mean is the fixed-width 16-bit encoding scheme (UCS-2).

@Rob - For the details, see [When, if ever, is loop unrolling still useful?](http://stackoverflow.com/questions/2349211/when-if-ever-is-loop-unrolling-still-useful) — Astaroth, Aug 09 '15 at 17:41
It's hard to answer this, because "the fastest" is really depending on a lot of factors. What kind of strings are you dealing with? Long strings or just a couple of characters? Do you plan to run the code on win32,win64,iOS,Android or MacOSX? On what type of processor? — Wouter van Nifterick, Aug 09 '15 at 17:46
@Wouter - I'm dealing with long strings. The code is targetted for both Win32 and Win64, that's why I prefer the pure Pascal approach. What I mean as "the fastest" is the fastest in general situations, not specific. — Astaroth, Aug 09 '15 at 17:49
Impossible to answer "fastest" without more specification of the input data — David Heffernan, Aug 09 '15 at 18:57
If this is a one-time task - assembler is your choice. Something like `scas`, as I remember @DavidHeffernan knows much more about modern assembler. To my great regret, I have been actively used an assembler about 20 years ago. However if you have some persistent set of the strings then you can associate to each something like `array[UnicodeChar] of Integer` (of course, if you do not taking memory in mind) and fill it for each string. So you will be able to get just by index where is Char first occurrence or, for example, how many characters in the string (depending what data in the array). — Abelisto, Aug 09 '15 at 20:13
scasw is not faster than an unrolled loop on a modern cpu. You will have to dig deep to find something faster (on average) than PosEx_Sha_Pas_2 for both win32 and win64. — LU RD, Aug 09 '15 at 20:27
To make your own test among the FastCode libraries, take out the purepascal solutions, adapt them for unicode and win64 and make your tests as suited for your application. I looked at the CharPosIEx zip file and it did not look too hard for such transition. — LU RD, Aug 09 '15 at 21:32
You can try usual unicodisation process - replacing all char/string with ansichar/ansistring. In the simplest case this will be enough. But I'm unsure FC is the simplest case. When I tried to borrow some of the FC routines for Unicode & x64 IDE's I found out that Alex Sharakhov's Pascal code is terrible. Magic constants, senseless variable names, poor formatting. It will take some efforts to understand the code and adopt it. — Fr0sT, Aug 10 '15 at 07:47
I'm not sure if Pos_Ex_Sha_Pas_2 is faster than a dedicated loop looking for exactly one char (WideChar). I doubt it. — Rudy Velthuis, Aug 10 '15 at 09:54
@Fr0sT; Ansification is always the wrong choice. Rather adpt your code to use Unicode. — Rudy Velthuis, Aug 10 '15 at 09:55
@DavidHeffernan - I have updated the title and some of my statements. I'm sorry, I'm in a hurry and have to leave now. I will comeback soon. — Astaroth, Aug 10 '15 at 11:11
@TobyAllen - The errors were related with AnsiChar & AnsiString issues, and the problem was there are hundreds of such functions. — Astaroth, Aug 10 '15 at 12:16
@LU_RD - OK, I will consider to take out the purepascal solutions and adapt them for Unicode as your recommendation. — Astaroth, Aug 10 '15 at 12:16
@RudyVelthuis ansification makes sense in two cases: 1) For routines that are intended to deal with 1-byte strings only (f.i., network request parsing and so on) and 2) To quickly check some routine that was written quite long ago. — Fr0sT, Aug 10 '15 at 13:31
@RudyVelthuis, tweaking PosEx_Sha_Pas_2 to only deal with single characters will gain a handful percent in speed. I tried, and is it really worth it? — LU RD, Aug 10 '15 at 22:16
Why not use CharPosEY_xxx_yyy_etc. or similar, for a single character search? — Rudy Velthuis, Aug 11 '15 at 08:12
IMHO, FastCode-like implementations will be really worth for applications in parser category. As far as I am concerned, most of popular pattern matching algorithms do not support very large alphabets such as Unicode, and therefore an optimized code for character/string matching will balance the slowness of Unicode string processing. — Astaroth, Aug 12 '15 at 04:43

score 3 · Accepted Answer · answered Aug 16 '15 at 18:09

Many of the solutions to find a character in a string amongst the fastcode examples, uses a technique to read the string in in larger chunks into a register and then analyze the register bytes for a match. this works fine when the characters are single bytes, but are not optimal when characters are 16 bit unicode.

Some examples even use a lookup table, but that is also not optimal in a unicode string search.

I find that the fastcode purepascal PosEx_Sha_Pas_2 string search routine works very good both in 32/64 bit mode even for single character search. You might as well use that routine.

I stripped out some parts not needed out of the PosEx_Sha_Pas_2 into CharPosEx_LU_Pas and gained some percent in execution time:

function CharPosEx_LU_Pas(c: Char; const S: string; Offset: Integer = 1): Integer;
var
  len: Integer;
  p, pStart, pStop: PChar;
label
  Loop0, Loop4,
  TestT, Test0, Test1, Test2, Test3, Test4,
  AfterTestT, AfterTest0,
  Ret;
begin;
  p := Pointer(S);

  if (p = nil) or (Offset < 1) then
  begin;
    Exit(0);
  end;

  len := PLongInt(PByte(p) - 4)^; // <- Modified to fit 32/64 bit
  if (len < Offset) then
  begin;
    Exit(0);
  end;

  pStop := p + len;
  pStart := p;
  p := p + Offset + 3;

  if p < pStop then
    goto Loop4;
  p := p - 4;
  goto Loop0;

Loop4:
  if c = p[-4] then
    goto Test4;
  if c = p[-3] then
    goto Test3;
  if c = p[-2] then
    goto Test2;
  if c = p[-1] then
    goto Test1;
Loop0:
  if c = p[0] then
    goto Test0;
AfterTest0:
  if c = p[1] then
    goto TestT;
AfterTestT:
  p := p + 6;
  if p < pStop then
    goto Loop4;
  p := p - 4;
  if p < pStop then
    goto Loop0;
  Exit(0);

Test3:
  p := p - 2;
Test1:
  p := p - 2;
TestT:
  p := p + 2;
  if p <= pStop then
    goto Ret;
  Exit(0);

Test4:
  p := p - 2;
Test2:
  p := p - 2;
Test0:
  Inc(p);
Ret:
  Result := p - pStart;
end;

I claim no originality to this snippet as it was a simple task to strip out those code parts not needed from PosEx_Sha_Pas_2.

Benchmark 32 bit (101 character string, last character matches): 50000000 repetitions.

System.Pos:       1547 ms
PosEX_Sha_Pas_2:  1292 ms
CharPosEx:        2315 ms
CharPosEx_LU_Pas: 1103 ms
SysUtils.StrScan: 2666 ms

Benchmark 64 bit (101 character string, last character matches): 50000000 repetitions.

System.Pos:      20928 ms
PosEX_Sha_Pas_2:  1783 ms
CharPosEx:        2874 ms
CharPosEx_LU_Pas: 1728 ms
SysUtils.StrScan: 3115 ms

@LU_RD - Many thanks for your excellent effort as well as your overview, that is I want to see. I'm sure many Delphi developers will take some benefit from it. — Astaroth, Aug 16 '15 at 18:46