0

I would expect the following Java code to split a string into three items:

    String csv = "1,2,";
    String[] tokens = csv.split(",");
    System.out.println(tokens.length);

However, I am only getting two items.

I must admit that I did not analyze this very deeply, but it seems counter-intuitive to me. Both Python and C# generate three items, as follows, in Python:

def test_split(self):
    line = '1,2,'
    tokens = line.split(",")
    for token in tokens:
        print('-' + token)
-1
-2
-

and in C#:

   [Test]
    public void t()
    {
        string s = "1,2,";
        var tokens = s.Split(',');
        foreach (var token in tokens)
        {
            Console.WriteLine("-" + token);
        }
    }
-1
-2
-

What am I missing?

This is Java 1.8.0_101.

AlexC
  • 3,343
  • 6
  • 29
  • 38
  • The character after the second character is `' '` - which is an incorrect representation of an empty character in Java (8). The correct way is `'\u0000'`. If you change the String csv to `"1,2,\u0000"`, then the java code will give the same output as your Python code does. – progyammer Sep 12 '16 at 14:48

2 Answers2

6

Use overloaded version of the method:

tokens = line.split(",", -1)
AdamSkywalker
  • 11,408
  • 3
  • 38
  • 76
4

The documentation is clear on this behavior:

This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.

jpw
  • 44,361
  • 6
  • 66
  • 86