7

I try to replace spaces with a new line using the TPerlRegEx class.

with RegExp do
begin
  Subject:=Memo1.Lines.Text;
  RegEx:=' ';
  Replacement:='\r\n';
  ReplaceAll;
  Memo1.Lines.Text:=Subject;
end;

The problem is that it treats the \r\n replacement as literal text.

Jack G.
  • 3,681
  • 4
  • 20
  • 24
Joacim Andersson
  • 381
  • 3
  • 11
  • Couldn't a StringReplace(AString, ' ', #10#13, [rfReplaceAll]) achieve the same? – Jack G. Jan 06 '13 at 18:25
  • Of course it could if this simple example was the actual code. I want to do the replacement with a Find/Replace dialog box in which the user enters the replacement text. – Joacim Andersson Jan 06 '13 at 18:35

3 Answers3

8

Use #13#10

program Project29;

{$APPTYPE CONSOLE}

uses
  SysUtils, PerlRegEx;

var RegEx: TPerlRegEx;

function CStyleEscapes(const InputText:string):string;
var i,j: Integer;

begin
  SetLength(Result, Length(InputText));
  i := 1; // input cursor
  j := 1; // output cursor
  while i <= Length(InputText) do
    if InputText[i] = '\' then
      if i = Length(InputText) then
        begin
          // Eroneous quotation...
          Result[j] := '\';
          Inc(i);
          Inc(j);
        end
      else
        begin
          case InputText[i+1] of
            'r', 'R': Result[j] := #13;
            'n', 'N': Result[j] := #10;
            't', 'T': Result[j] := #9;
            '\':
              begin
                Result[j] := '\';
                Inc(j);
                Result[j] := '\';
              end;
            else
              begin
                Result[j] := '\';
                Inc(j);
                Result[j] := InputText[i+1];
              end;
          end;
          Inc(i,2);
          Inc(j);
        end
    else
      begin
        Result[j] := InputText[i];
        Inc(i);
        Inc(j);
      end;
  SetLength(Result, j-1);
end;

begin
  RegEx := TPerlRegEx.Create;
  try

    RegEx.RegEx := ' ';
    RegEx.Replacement := CStyleEscapes('\t\t\t');;
    RegEx.Subject := 'FirstLine SecondLine';
    RegEx.ReplaceAll;
    WriteLn(RegEx.Subject);

    ReadLn;

  finally RegEx.Free;
  end;
end.
Cosmin Prund
  • 25,498
  • 2
  • 60
  • 104
  • Well, it's really the user that enters the replacement in a Find/Replace dialog box. The real question is why TPerlRegEx doesn't translate \r\n? – Joacim Andersson Jan 06 '13 at 16:01
  • Apparently it only interprets the `\0`..`\n` group references and nothing else. I don't know about the `why`. Do the replacements yourself before filling in the `Replacement` property. – Cosmin Prund Jan 06 '13 at 16:37
  • Changed the code to include a `CStyleEscapes` function that does the conversion from `\r\n` to the expected characters even if TPerlRegEx doesn't. – Cosmin Prund Jan 06 '13 at 16:47
  • Thanks for the code example, I've already done something similar but I really wanted to know why it doesn't do the matching as expected. +1 for the effort. – Joacim Andersson Jan 06 '13 at 17:39
  • @JoacimAndersson Where in the documentation is the statement that \r and \n are escape sequences in replacement text? – David Heffernan Jan 06 '13 at 17:50
  • @JoacimAndersson, why would you think the `Replacement` string should understand those things that I call C-Style escape sequences? It's not documented that it should and I honestly fail to see why you'd think it does. Would you also also expect it to handle html-style "entities" (ie: `&nbsp`)? Just for argument's sake, I tested the search-and-replace dialog in Word, it doesn't understand `\n` either! Should I complain to Microsoft? Nor does Delphi's search-and-replace. Nor does Notepad++... – Cosmin Prund Jan 06 '13 at 18:57
  • No I wouldn't expect it to understand entities but I do expect that a regular expression engine would understand regular expressions. The documentation does not mention these characters specifically but it does mention \0x0a\0x0d and \u000a\u000d but those doesn't work either. The regular expression engine in .Net, Java, and Perl does understand expressions in the replacement as well as in the search and this is supposed to be Perl compatible. – Joacim Andersson Jan 06 '13 at 19:10
  • 1
    Besides you're wrong about Notepad++, it does this exact Find and Replace if you check the Regular Expression checkbox in the dialog. The same is true for Delphi. Word however doesn't have Regular Expression support but does have other special character replacements. – Joacim Andersson Jan 06 '13 at 19:20
  • You're right on both accounts, I tested both Delphi and Notepad++ again, paying attention the "RegEx" checkbox. – Cosmin Prund Jan 06 '13 at 19:33
  • @Cosmin The documentation doesn't match the behaviour, at least the way I read the documentation. Of course the documentation appears to be this site: http://www.regular-expressions.info/refreplace.html which is a bit of an odd place for it to live – David Heffernan Jan 06 '13 at 19:35
  • 3
    @JoacimAndersson As the comments revealed, your question evolved to be more about what escapes were understood by this component. You should have edited the question to include those details. – David Heffernan Jan 06 '13 at 19:36
  • @DavidHeffernan I don't think my question has evolved. It was and still is why the TPerlRegEx class doesn't understand regular expressions in the replacement. You may call them escape characters but they are just plain regexes. – Joacim Andersson Jan 06 '13 at 19:53
  • They are escape sequences. That's not really up for debate. Different regex flavours accept different escape sequences. – David Heffernan Jan 06 '13 at 20:11
  • I agree that you seem to be asking "why the TPerlRegEx class doesn't understand regular expressions in the replacement". I think you should update the question to state that very explicitly. Anyway, you've accepted this answer even though I can't see how it addresses "why the TPerlRegEx class doesn't understand regular expressions in the replacement". I explained why the code behaves as it does. As for why it was designed that way, I think you'd need to ask the designer. – David Heffernan Jan 06 '13 at 20:12
  • 1
    It can't be "regular expressions in the replacement", at most it could be regular-expressions-like escape sequences. – Cosmin Prund Jan 06 '13 at 20:14
6

I really wanted to know why it doesn't do the matching as expected.

Processing of \ escape sequences in the Replacement text is performed in TPerlRegEx.ComputeReplacement. If you take a look at the code you will find that there are no sequences that yield the carriage return and line feed characters. In fact ComputeReplacement is all about back references.

The processing of the matching phase of the regex is performed by the PCRE code. However, the replacement phase is pure Pascal code. And it's easy enough to inspect the code to see what it does. And it doesn't do what you think and expect it to do.

The conclusion is that you cannot specify the characters you want using escape sequences. I think you will need to devise your own rules for escaping non-printable characters and apply those rules in an OnReplace event handler.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
1

Edit, as I learned something new today.

I ran in the same problem as the question a while ago, and took the wrong conclusion that
TRegEx does not do any C-style backslash escape expansion at all.

The correct conclusion should have been that
TRegEx does not do C-style backslash escape expansion in the replacement string parameters, and I should research if it does in the pattern string parameters.

I knew support of character escaping mechanisms varies by development tool.

For instance, C, C#, Java, Perl, PHP, Ruby, bash, and many more do backslash escape expansion.
But since the Delphi compiler (since it is not a C-style compiler) doesn't.
It will expand Pascal-style escapes (like #13#10, or ^M^J) into CRLF though.

So I did that research today (thanks David for pointing me to my initial mistake), and came up with two examples (one in Delphi and one in C#) that has a function that basically does this:

  • show the pattern match result of a known CRLF string, and a pattern that contains a string
  • show the replacement of space by a string

Then the example function is called by:

  • a string that in source code is backslash escaped \r\n string, so might be parsed by the compiler
  • a string that is put together character so it becomes a backslash escaped \r\n string runtime to it might get parsed by the RegEx engine

From the output in both examples, you see that:

  • The Delphi compiler does not parse the \r\n string
  • The C# compiler does parse the \r\n string
  • The RegEx engine in both Delphi and C# parses the pattern \r\n string at run-time (RegEx documentation)
  • The RegEx engine in both Delphi and C# do not parse the replace \r\n string at run-time (RegEx documentation)

The recommendation stil stands:

So either use the Pascal-style escapes, or use a C-Style backslash expansion function like Cosmin wrote.

As a side note: When using any expansion function, you should keep in mind that it will alter the meaning of text. Delphi users might not expect C-style expansion of strings.

Community
  • 1
  • 1
Jeroen Wiert Pluimers
  • 23,965
  • 9
  • 74
  • 154
  • The statement "RegEx doesn't do the C-style backslash escape expansion in any development tool: any string expansions are part of the compiler or interpreter in your development tool." is not true. – David Heffernan Jan 07 '13 at 09:30
  • this any better? if not so, why not? – Jeroen Wiert Pluimers Jan 07 '13 at 10:35
  • No. The statement is still factually incorrect. For example, see what this evaluates to: `TRegEx.Match(sLineBreak, '\r\n').Success` – David Heffernan Jan 07 '13 at 10:41
  • Crap, I ran into the same problem as the question a while ago, and deducted that the Delphhi `TRegEx` does not expand backslashes at all. You are right: Delphi `TRegEx` (actually: the underlying RegEx library) does not expand the `replacement`, but does expand the `pattern`. It is even in the RegEx docs: http://www.regular-expressions.info/reference.html I stand corrected, and will update my answer with a few demo programs, as the same holds in .NET using C# as well (which is a different implementation, but appears to use the same rules). – Jeroen Wiert Pluimers Jan 07 '13 at 13:14
  • What I find a bit off is the replacement syntax docs (http://www.regular-expressions.info/refreplace.html). For JGsoft they state support for char code with \u0000 to \uFFFF, but that support is not present in the Delphi code. – David Heffernan Jan 07 '13 at 13:34
  • Even worse: the supports \x does not work at all. See the Delphi output at http://besharp.codeplex.com/SourceControl/changeset/view/97422#2376115 (the cause is `PCRE`) and the C# output shows that .NET supports neither http://besharp.codeplex.com/SourceControl/changeset/view/97422#2376152 – Jeroen Wiert Pluimers Jan 07 '13 at 14:22