-2

I am writing a Java program in which a tab separated values (TSV) file containing two columns of information is read by a BufferedReader and then split into two components (which will serve as [key,value] pairs in a HashMap later in the program) using String.split("\t"). Let's say the first line of the TSV file is as follows:

Key1\tHello world\nProgramming is cool\nGoodbye

The code shown below would separate this line into "Key1" and "Hello world\nProgramming is cool\nGoodbye":

File file = new File("sample.tsv");
BufferedReader br = new BufferedReader(new FileReader(file));
String s = br.readLine();
String[] tokens = new String[2];
tokens = s.split("\t");

The problem now comes in trying to print the second string (i.e. tokens[1]).

System.out.println(tokens[1]);

The line of code above results in the second string being printed with the newline characters (\n) being ignored. In other words, this is printed...

Hello world\nProgramming is cool\nGoodbye

...instead of this...

Hello world

Programming is cool

Goodbye

If I create a new string with the same text as above and use the String.equals() method to compare the two, it returns false.

String str = "Hello world\nProgramming is cool\nGoodbye";
boolean sameString = str.equals(tokens[1]);    // false

Why can't special characters in the strings returned by String.split() be printed properly?

  • 2
    Likely the file has literal `\n` text within it, not new line chars. If so then perhaps you want to do a `replaceAll(...)` on your String before printing. – Hovercraft Full Of Eels May 27 '17 at 03:16
  • 2
    I'm pretty confused about what you're asking. But if Hovercraft is right, then it needs to be pointed out that character sequences with backslashes, such as `\n`, are not treated specially when doing input. They're treated specially inside string and character literals in a Java program, because the Java compiler looks for them and interprets them specially. But Java I/O methods don't do that. – ajb May 27 '17 at 03:22
  • Yes, @HovercraftFullOfEels is correct. I was able to resolve my issues using `replaceAll(...)`, although I couldn't figure out how to get this method to find the literal `\n`. I ended up changing all occurences of `\n` to something else that the `replaceAll(...)` method could find (i.e. ``). – Isaac Loegering May 28 '17 at 04:53

2 Answers2

0

BufferedReader.readLine() read your string as one line, as that's how it's represented in the file. Buffered reader didn't read "\n" as ASCII(10) 0x0A, it read "ASCII(92) 0x9C ASCII(110) 0x6E".

If you type the input file the way you expect to see it with your text editor, it will print the way you expect.

on a unix like system:

echo -e "Hello world\nProgramming is cool\nGoodbye" > InputFile.result_you_want

echo "Hello world\nProgramming is cool\nGoodbye" > InputFile.result_you_get

You could use a program like echo to convert your TSV, but then you will need to split on the "\t" character, ASCII(9) 0x09, and not a literal "\t".

Split takes a regular expression. Escaping that tab character may be interesting. "\t" or "\\t" may do the trick there.

If this is for work, you may want to use a tool or library to work around having to convert your file with echo. String parsing in Java with delimeter tab "\t" using split has some suggestions there.

Searching for CSV java API's could be very useful. Most will let you set the delimiter character and information on line ending formats.

RileyR
  • 11
  • 3
-1

because in computer aspect, the text '\n' is not like the binary '\n'.

the first line of ur file, i think is like key1 Hello world\nProgramming\ncool

so it's the it can split the \t,but when it comes to print, it only show the text '\n' but not the binary '\n' which will make the new Line

herokingsley
  • 403
  • 3
  • 10