0

I work with Delphi XE6.

I used to use this code to remove whitespace from a string:

function RemoveWhitespace(S: string): string;
begin
  Result := StringReplace(S, ' ', '', [rfReplaceAll]);
end;

But now I realized it only removes spaces. (I had a similar problem in C#.)

Vlastimil Burián
  • 3,024
  • 2
  • 31
  • 52
  • 1
    There are hundreds of existing questions on this topic (remove whitespace, remove non-letters, remove letters, remove digits, remove non-digits, remove punctuation, remove non-punctuation, remove snowmen (☃), remove non-snowmen, etc.), and they are essentially all the same except for the predicate. Here's one good example with some benchmarking: https://stackoverflow.com/a/75158947/282848 – Andreas Rejbrand Mar 29 '23 at 08:48
  • 1
    Currently [Unicode defines 25 graphemes as whitespaces](https://en.wikipedia.org/wiki/Whitespace_character#Unicode), but as per personal intention one wants to not remove certain whitespaces (f.e. line breaks) and/or also remove non-whitespaces (f.e. zero width space, Hangul filler or Braille pattern blank). Your code just uses the literal `' '` instead of naming multiple characters - it works as intended and "_space_" is exactly just that one character. – AmigoJack Mar 29 '23 at 10:42

1 Answers1

2

An optimized version:

USES System.Character;

FUNCTION RemoveWhiteSpace(CONST S : STRING) : STRING;
  VAR
    I : INTEGER;
    C : CHAR;

  BEGIN
    SetLength(Result,LENGTH(S));
    I:=0;
    FOR C IN S DO IF NOT C.IsWhiteSpace THEN BEGIN
      INC(I);
      Result[I]:=C
    END;
    SetLength(Result,I)
  END;
HeartWare
  • 7,464
  • 2
  • 26
  • 30
  • Yes, this is how it is done. – Andreas Rejbrand Mar 29 '23 at 08:50
  • In what sense is this "optimized" version, please? Edit your answer to contain this info, thanks. – Vlastimil Burián Mar 29 '23 at 09:48
  • 2
    @VlastimilBurián: It only allocates a string heap object two times (two `SetLength` calls). Your code reallocates the string as many times as there are non-whitespace chars in the string. This makes your code much less efficient. Worst case, you allocate an empty string at some place in your computer's RAM. Then you add one char to this string, forcing the computer to create a new string heap object at some other location in RAM, copying the old (empty) string to that location, and adding the new char. Then the next time you find a non-whitespace char, you need to allocate yet another ... – Andreas Rejbrand Mar 29 '23 at 09:51
  • 2
    ...string object on the heap, copy the old (1-char) string to that location, and add the new char. Then the third time you find a non-whitespace char, you need to create a third string object on the heap, at some other location in RAM. Then you need to copy the old two-char string to that location, and add the new char. Then the fourth time you find a non-whitespace char, you need to create a fifth string object on the heap, at some other location in RAM. Then you need to copy the old three-char string to that location, and add the new char. ... – Andreas Rejbrand Mar 29 '23 at 09:53
  • 3
    ... Then the fifth time you find a non-whitespace char, you need to create a sixth string object on the heap, at some other location in RAM. Then you need to copy the old four-char string to that location, and add the new char. And so on! You can find some actual benchmarking here: https://stackoverflow.com/a/75158947/282848 – Andreas Rejbrand Mar 29 '23 at 09:53
  • 2
    (Except that the empty string heap object doesn't exist. But you get the point.) – Andreas Rejbrand Mar 29 '23 at 09:57