0

My application processes excel files; but when opening xlsx Excel format, it gives me the following error. Is there something I'm doing wrong or anything I've left out?. Below is the code and the error stack trace; I've traced the error's origin to here:

myExcelBook = new XSSFWorkbook(new FileInputStream(f));
myExcelSheet = ((XSSFWorkbook) myExcelBook).getSheetAt(0);

The exception stack trace:

Caused by: java.io.CharConversionException: Characters larger than 4 bytes are not supported: byte 0x8f implies a length of more than 4 bytes
    at org.apache.xmlbeans.impl.piccolo.xml.UTF8XMLDecoder.decode(UTF8XMLDecoder.java:162)
    at org.apache.xmlbeans.impl.piccolo.xml.XMLStreamReader$FastStreamDecoder.read(XMLStreamReader.java:762)
    at org.apache.xmlbeans.impl.piccolo.xml.XMLStreamReader.read(XMLStreamReader.java:162)
    at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yy_refill(PiccoloLexer.java:3477)
    at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yylex(PiccoloLexer.java:3962)
    at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yylex(Piccolo.java:1290)
    at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yyparse(Piccolo.java:1400)
    at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.parse(Piccolo.java:714)
    at org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Locale.java:3479)
    at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1277)
    at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1264)
    at org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:345)
    at org.openxmlformats.schemas.spreadsheetml.x2006.main.SstDocument$Factory.parse(Unknown Source)
    at org.apache.poi.xssf.model.SharedStringsTable.readFrom(SharedStringsTable.java:119)
    at org.apache.poi.xssf.model.SharedStringsTable.<init>(SharedStringsTable.java:106)
    at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:67)
    ... 61 more

I wanted to get the number of columns and cell data from the xlsx file for processing in my application, but when I load the file in the JavaFx application, the error shows up and the application freezes.

trashgod
  • 203,806
  • 29
  • 246
  • 1,045
  • 1
    Are you sure that input is valid xls(x) file and not a binary xlsb format? – zforgo Mar 09 '23 at 08:56
  • 2
    You are using an ancient version of xmlbeans. `piccolo` in your stacktace: removed years ago. You really would have a better chance of success if you tried to use up to date jars. – PJ Fanning Mar 09 '23 at 10:10
  • @zforgo, It is a valid xlsx file containing some demo data from twitter. – NebuCodeNezzar Mar 09 '23 at 10:46
  • 1
    As an aside, this really has nothing to do with JavaFX (and I see jewelsea removed the [tag:javafx] tag). Mentioning it only confuses the issue. The stack trace shows the error is entirely about _xmlbeans_ inability to parse your xlsx file. Which, again, is separate from JavaFX. I bring this up because understanding the scope of the problem can help find a solution. – Slaw Mar 09 '23 at 11:05
  • 1
    That said, looking at [this Q&A](https://stackoverflow.com/q/52810296/6395627), [this Q&A](https://stackoverflow.com/q/31296635/6395627), and [this Q&A](https://stackoverflow.com/q/53002856/6395627), it would appear your error is caused by an encoding issue and/or your file is really an xlsb file and not an xlsx file (remember, just because your file ends in `.xlsx` does not necessarily mean the file is in xlsx format). Also, make sure you're using the latest versions of the library as possible. – Slaw Mar 09 '23 at 11:09
  • as @PJFanning said, I was using an outdated version of apache poi libraries in my pom.xml dependencies. – NebuCodeNezzar Mar 24 '23 at 05:52

1 Answers1

0

The issue here is that I was using an outdated version of the apache poi and XML beans libraries.By upgrading to the apache poi version 5.2.3 , org.apache.xmlbeans version 5.1.1 and poi-ooxml version 5.2.3 , I was able to open the xlsx file. Also, I faced the error below during opening one of my larger xlsx file. Error:

  Exception in thread "main" org.apache.poi.util.RecordFormatException: Tried to read data but the maximum length for this record type is 100,000,000.
If the file is not corrupt and not large, please open an issue on bugzilla to request 
increasing the maximum allowable size for this record type.
You can set a higher override value with IOUtils.setByteArrayMaxOverride()
    at org.apache.poi.poi/org.apache.poi.util.IOUtils.throwRecordTruncationException(IOUtils.java:607)
    at org.apache.poi.poi/org.apache.poi.util.IOUtils.toByteArray(IOUtils.java:249)
    at org.apache.poi.poi/org.apache.poi.util.IOUtils.toByteArrayWithMaxLength(IOUtils.java:220)
    at org.apache.poi.ooxml/org.apache.poi.openxml4j.util.ZipArchiveFakeEntry.<init>(ZipArchiveFakeEntry.java:81)
    at org.apache.poi.ooxml/org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:98)
    at org.apache.poi.ooxml/org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:132)
    at org.apache.poi.ooxml/org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:319)
    at org.apache.poi.ooxml/org.apache.poi.ooxml.util.PackageHelper.open(PackageHelper.java:59)
    at org.apache.poi.ooxml/org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:290)
    at org.apache.poi.ooxml/org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:286)
    at com.example.scrapper/com.example.scrapper.ChangeToCSV.main(ChangeToCSV.java:15)

Process finished with exit code 1

I fixed the error by using the line below, just before declaring the Workbook and other variables to open the xlsx file: IOUtils.setByteArrayMaxOverride(1000000000); Illustration here:

IOUtils.setByteArrayMaxOverride(1000000000);
try {
     FileInputStream inputStream = new FileInputStream(xlsxFilePath);
     Workbook workbook = new XSSFWorkbook(inputStream);
     Sheet sheet = workbook.getSheetAt(0);
     //The rest of the code