10

I use the StrUtils in to split a string into a TStringDynArray, but the output was not as expected. I will try to explain the issue:

I have a string str: 'a'; 'b'; 'c'
Now I called StrUtils.SplitString(str, '; '); to split the string and I expected an array with three elements: 'a', 'b', 'c'

But what I got is an array with five elements: 'a', '', 'b', '', 'c'.
When I split with just ';' instead of '; ' I get three elements with a leading blank.

So why do I get empty strings in my first solution?

Obl Tobl
  • 5,604
  • 8
  • 41
  • 65
  • 4
    Read the docs. Perhaps not as expected, but it works as documented. – Rudy Velthuis Mar 07 '16 at 13:21
  • This question has some suggestions on splitting a string, based on a multi-character string (what you expected it was doing), but most of them work with string lists, not arrays: http://stackoverflow.com/questions/15424293/how-to-split-string-by-a-multi-character-delimiter – quasoft Mar 07 '16 at 18:22

3 Answers3

16

This function is designed not to merge consecutive separators. For instance, consider splitting the following string on commas:

foo,,bar

What would you expect SplitString('foo,,bar', ',') to return? Would you be looking for ('foo', 'bar') or should the answer be ('foo', '', 'bar')? It's not clear a priori which is right, and different use cases might want different output.

If your case, you specified two delimiters, ';' and ' '. This means that

'a'; 'b'

splits at ';' and again at ' '. Between those two delimiters there is nothing, and hence an empty string is returned in between 'a' and 'b'.

The Split method from the string helper introduced in XE3 has a TStringSplitOptions parameter. If you pass ExcludeEmpty for that parameter then consecutive separators are treated as a single separator. This program:

{$APPTYPE CONSOLE}

uses
  System.SysUtils;

var
  S: string;

begin
  for S in '''a''; ''b''; ''c'''.Split([';', ' '], ExcludeEmpty) do begin
    Writeln(S);
  end;
end.

outputs:

'a'
'b'
'c'

But you do not have this available to you in XE2 so I think you are going to have to roll your own split function. Which might look like this:

function IsSeparator(const C: Char; const Separators: string): Boolean;
var
  sep: Char;
begin
  for sep in Separators do begin
    if sep=C then begin
      Result := True;
      exit;
    end;
  end;
  Result := False;
end;

function Split(const Str, Separators: string): TArray<string>;
var
  CharIndex, ItemIndex: Integer;
  len: Integer;
  SeparatorCount: Integer;
  Start: Integer;
begin
  len := Length(Str);
  if len=0 then begin
    Result := nil;
    exit;
  end;

  SeparatorCount := 0;
  for CharIndex := 1 to len do begin
    if IsSeparator(Str[CharIndex], Separators) then begin
      inc(SeparatorCount);
    end;
  end;

  SetLength(Result, SeparatorCount+1); // potentially an over-allocation
  ItemIndex := 0;
  Start := 1;
  CharIndex := 1;
  for CharIndex := 1 to len do begin
    if IsSeparator(Str[CharIndex], Separators) then begin
      if CharIndex>Start then begin
        Result[ItemIndex] := Copy(Str, Start, CharIndex-Start);
        inc(ItemIndex);
      end;
      Start := CharIndex+1;
    end;
  end;

  if len>Start then begin
    Result[ItemIndex] := Copy(Str, Start, len-Start+1);
    inc(ItemIndex);
  end;

  SetLength(Result, ItemIndex);
end;

Of course, all of this assumes that you want a space to act as a separator. You've asked for that in the code, but perhaps you actually want just ; to act as a separator. In that case you probably want to pass ';' as the separator, and trim the strings that are returned.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
15

SplitString is defined as

function SplitString(const S, Delimiters: string): TStringDynArray;

One would thought that Delimiters denote single delimiter string used for splitting string, but it actually denotes set of single characters used to split string. Each character in Delimiters string will be used as one of possible delimiters.

SplitString

Splits a string into different parts delimited by the specified delimiter characters. SplitString splits a string into different parts delimited by the specified delimiter characters. S is the string to be split. Delimiters is a string containing the characters defined as delimiters.

Dalija Prasnikar
  • 27,212
  • 44
  • 82
  • 159
  • 1
    I assume they would have called it `Delimiter` (singular) then, not `Delimiters`. FWIW, In later versions, `TStringHelper` has a version of `Split` that also takes string as delimiter, not just chars, But unfortunately not in XE2. – Rudy Velthuis Mar 07 '16 at 13:23
  • @RudyVelthuis Agreed. But fine line between Delimiter and Delimiters meaning may be lost if you are not native english speaker. Besides that, split operations in other languages usually take complete, exact delimiter given so this Delphi implementation is rather confusing from that aspect too. – Dalija Prasnikar Mar 07 '16 at 13:36
  • @RudyVelthuis, But Split also has its own set of quirks : http://stackoverflow.com/questions/28410901/string-split-works-strange-when-last-value-is-empty – Ken Bourassa Mar 07 '16 at 14:29
  • @KenBourassa: I know. When I was writing an AnsiString replacement, I tested it against the existing TStringHelpers, and found (and reported) a few issues, especially with Split and Join. These were all fixed later on, though. – Rudy Velthuis Mar 07 '16 at 16:27
  • @RudyVelthuis The issue that Ken refers to remains in Seattle. As do others. – David Heffernan Mar 07 '16 at 17:01
5

It is because the second parameter of SplitString is a list of single character delimiters, so '; ' means split at a ';' OR split at a ' '. So the string is split at every ';' and at every space, and between the ';' and the ' ' there is nothing, hence the empty strings.

Dsm
  • 5,870
  • 20
  • 24