Reading binary chars from a CSV file

Question

I have a strange problem: I have a CSV file that I read correctly with Notepad and MS Excel 2010.

I tried reading the rows of this file with this code:

BufferedReader source = new BufferedReader(new FileReader(fileName));
String currentRow = null;
while (null != (currentRow=source.readLine())){
    System.outprintln(currentRow)
}

When the program runs, I read just the binary characters and the length of the row is different from the reality (I expect 2000 chars for line and I found 55 chars or 1 char).

I work in Eclipse: If I open this CSV file as text editor I read strange chars, when I open it as system editor I read the correct value inside MS Excel.

The type of this file is file with comma separated value of Microsoft Excel: does this file have some binary chars?

I tried to use Apache POI (reading the file in CSV and in XLS) with this code:

public void displayFromExcel (String xlsPath){
    POIFSFileSystem fileSystem = null;
    try{
        fileSystem = new POIFSFileSystem (new FileInputStream (xlsPath));
        HSSFWorkbook workBook = new HSSFWorkbook (fileSystem);
        HSSFSheet sheet = workBook.getSheetAt (0);
        Iterator<Row> rows = sheet.rowIterator();

        while (rows.hasNext ()){
            HSSFRow row = (HSSFRow) rows.next ();
            System.out.println ("Row No.: " + row.getRowNum ());
            Iterator<Cell> cells = row.cellIterator();
            while (cells.hasNext ()){
                HSSFCell cell = (HSSFCell) cells.next ();

                System.out.println ("Cell No.: " + cell.getCellNum ());

                switch (cell.getCellType ()){
                    case HSSFCell.CELL_TYPE_NUMERIC :
                        System.out.println ("Numeric value: " + cell.getNumericCellValue ());
                        break;
                    case HSSFCell.CELL_TYPE_STRING :
                        HSSFRichTextString richTextString = cell.getRichStringCellValue ();
                        System.out.println ("String value: " + richTextString.getString ());
                        break;
                    default :
                        System.out.println ("Type not supported.");
                        break;
                }
            }
        }
    } catch (IOException e) {
        e.printStackTrace ();
    }
}

It isn't working, I receive this message to the console:

java.io.IOException: Invalid header signature; read 0x003000310030FEFF, expected 0xE11AB1A1E011CFD0
    at org.apache.poi.poifs.storage.HeaderBlockReader.<init>(HeaderBlockReader.java:125)
    at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:153)

When it runs this instruction:

POIFSFileSystem fileSystem = new POIFSFileSystem (new FileInputStream (xlsPath));

I tried to use the library datafile and the Java I/O (DataInputstream, etc.), but without success.

Any idea for the solution?

I'd guess your Excel file header is incorrect, based on the error you posted — Adrian, Feb 14 '12 at 15:30

score 1 · Accepted Answer · edited May 23 '17 at 12:31

You need to read this file with something more complicated than FileReader. Check out How to reliably guess encoding. Then either find something that will read the file as encoded or write something that will filter out the junk. I have found that if you treat a file as straight ASCII and throw out everything that's not a valid ASCII character, it will read a straight Unicode file (as well as a straight ASCII file) quite nicely. If it's UTF-8 with Egyptian Hieroglyphics (and you want those Hieroglyphics) this doesn't work so well.

So first try to get "them" to give you a better file. When that doesn't work, do some research in the java.io Javadoc and then do some programming.

Reading binary chars from a CSV file

1 Answers1