Using Delphi 7 kind of works with Unicode in WideString
s, but it is not consistent:
var
RegExprWLineSeparators: WideString;
begin
// In the following line the literal ends up producing ASCII question marks for the 5th and 6th character.
RegExprWLineSeparators:= #$d#$a#$b#$c+ WideChar($2028)+ WideChar($2029)+ #$85;
// But assigning characters individually will make both correct - so first do the one above
// (or provide anything, because you want to reassign it anyway) and later make it per character.
RegExprWLineSeparators[5]:= WideChar($2028);
RegExprWLineSeparators[6]:= WideChar($2029);
A few characters aren't even assignable this way (neither via text literal, nor via ordinal literal), so you use a different approach to test against these:
var
sText: Widestring;
begin
sText:= <something>;
// Checking if the first character is a UTF-16 BE or LE BOM
case Word(sText[1]) of
$FEFF,
$FFFE: Delete( sText, 1, 1 ); // Remove such a character
end;
Thumb of rules are:
- use
Word
s and cast them to WideChar
when using text literals
- use
Word
over WideChar
when comparing/checking
- noncharacters (like U+FFFE and U+FFFF) are usually unassignable
Using Delphi 7 for Unicode as a beginner should be avoided - do this when you're confident with Unicode, UTF-16 and Pascal in general. I started this since Windows 2000 on Delphi 5 and later continued with Delphi 7, having experiences like these in different occasions (regular expressions, amongst others).
As an alternative to a dated Delphi version you could try the free Lazarus IDE for FPC - it uses UTF-8 as an approach to Unicode and should treat/support text literals in code much better. The IDE even looks like the robust Delphi 7 one.