1

I am uploading csv files using jdbc to teradata. Everything used to be fine, until recently I came across a csv file that had some weird characters and my code failed to upload .

I opened the csv file in Notepad ++ , and it look like this SUB . When I open it in Excel it looks like this ->->

When I manually deleted those characters, everything went back to normal. I am curious , is there any way I could use java to clean a csv file to remove all kind of invalid characters ?

Borat Sagddiev
  • 807
  • 5
  • 14
  • 28

2 Answers2

3

You can try:

myString.replaceAll("\\p{C}", "?");

If you want to remove it:

myString.replaceAll("\\p{C}", "");

More here: How can I replace non-printable Unicode characters in Java?

Community
  • 1
  • 1
Quark
  • 1,578
  • 2
  • 19
  • 34
3

The SUB character is an ASCII 26 (= hex 0x1A). Back when DEC-10s ruled the earth, this was called Ctrl-Z. It is used to indicate the end of a file.

If it indeed at the end of the file, and you read it in using a Java InputStream (and please have a look at Read/convert an InputStream to a String) it will take off that terminal Ctrl-Z.

It would be quite unusual (and a problem) to have the SUB inside the CSV data, unless it were representing a binary object.

Community
  • 1
  • 1
rajah9
  • 11,645
  • 5
  • 44
  • 57
  • Thank you. I had a feeling it was the end of file flag because i tried to import it to BI tools like SAS etc. and it got imported until the first `SUB` – Borat Sagddiev Feb 15 '15 at 02:09