0

I have a huge excel file which I am trying to parse using SAX parser in JAVA. I am mostly making use of Apache POI library and working with .XLSX files. Here is how xml contents looks inside zipped excel folder at /xl/worksheets/sheet1.xml which i am trying to read:

<row r="1">
<c r="A1" t="inlineStr"><is><t>my value 1</t></is></c>
<c r="B1" t="inlineStr"><is><t>my value 2</t></is></c>
<c r="C1" t="inlineStr"><is><t>my value 3</t></is></c>
</row>

This one particular excel file is making use of inline string values as shown above.

This my function which executes the program to read the file:

public void executeExcelDataExtraction() throws IOException, OpenXML4JException, SAXException, ParserConfigurationException, XMLStreamException, FactoryConfigurationError {
    OPCPackage pkg = OPCPackage.open(XLSX_INPUT_FILE.xlsx);
    XSSFReader r = new XSSFReader( pkg );
    SharedStringsTable sst = r.getSharedStringsTable();
    ImportArticleDataProcessorExcelFileReaderFactory handlerFactory = new 
         ImportArticleDataProcessorExcelFileReaderFactory(sst);
    XMLReader parser = fetchSheetParser(handlerFactory);
    Iterator<InputStream> sheets = r.getSheetsData();
    if (sheets instanceof XSSFReader.SheetIterator) {
        XSSFReader.SheetIterator sheetiterator = 
             (XSSFReader.SheetIterator)sheets;
        while(sheetiterator.hasNext()) {
            System.out.println("Processing new sheet:\n");
            InputStream sheet = sheets.next();
            InputSource sheetSource = new InputSource(sheet);
            parser.parse(sheetSource);
            rowCache = handlerFactory.getRowCache();
            sheet.close();
            pkg.close();
            if(!rowCache.isEmpty()) 
                createCategoryMap(rowCache);                
        }
    }
}

and this is my sheet handler factory class which is used in above function.

import java.util.LinkedList;
import java.util.List;

import org.xml.sax.Attributes;
import org.apache.poi.xssf.model.SharedStringsTable;
import org.apache.poi.xssf.usermodel.XSSFRichTextString;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class ImportArticleDataProcessorExcelFileReaderFactory extends DefaultHandler{

    private static final String ROW_EVENT = "row";
    private static final String CELL_EVENT = "c";
    private SharedStringsTable sst;
    private String lastContents;
    private boolean nextIsString;
    private List<String> cellCache = new LinkedList<>();
    private List<String[]> rowCache = new LinkedList<>();

    ImportArticleDataProcessorExcelFileReaderFactory(SharedStringsTable sst) {
        this.sst = sst;
    }

    public void startElement(String uri, String localName, String name,
            Attributes attributes) throws SAXException {
        // c => cell
        if (CELL_EVENT.equals(name)) {
            String cellType = attributes.getValue("t");
            if(cellType != null && cellType.equals("s")) {
                nextIsString = true;
            } else {
                nextIsString = false;
            }
        } else if (ROW_EVENT.equals(name)) {
            if (!cellCache.isEmpty()) {
                rowCache.add(cellCache.toArray(new String[cellCache.size()]));
            }
            cellCache.clear();
        }
        lastContents = "";
    }

    public void endElement(String uri, String localName, String name)
            throws SAXException {
        // Process the last contents as required.
        // Do now, as characters() may be called more than once
        if(nextIsString) {
            int idx = Integer.parseInt(lastContents);
            lastContents = new XSSFRichTextString(sst.getEntryAt(idx)).toString();
            nextIsString = false;
        }

        // v => contents of a cell
        // Output after we've seen the string contents
        if(name.equals("v")) {
            cellCache.add(lastContents.trim());
        }
    }

    public void characters(char[] ch, int start, int length)
            throws SAXException {
        lastContents += new String(ch, start, length);
    }

    public List<String[]> getRowCache() {
        return rowCache;
    }

}

All other excel files which are not having inline string are able to read successfully, however with files having inline string inside, the algorithm only reads cellType=inlineStr but never gets the right value.

What I want:

All I want is just to print the values located inside inline string cell e.g. in my case it is "my value 1", "my value 2" and "my value 3"

Radheya
  • 779
  • 1
  • 11
  • 41
  • 1
    See https://stackoverflow.com/questions/48843618/how-to-check-a-number-in-a-string-contains-a-date-and-exponential-numbers-while/48889186#48889186 – Axel Richter Jun 14 '19 at 12:01
  • @AxelRichter wow! this answer solved my problem. I did few changes as shown in that answer. thanks a ton! – Radheya Jun 14 '19 at 12:50

1 Answers1

0

if anyone is looking for similar solution, just want to let you know that i have solved it by adding these few lines in my ImportArticleDataProcessorExcelFileReaderFactory class above:

public void startElement(String uri, String localName, String name){

        // rest of the code...

        inlineStr = false;
        if(cellType != null && cellType.equals("inlineStr")) {
            inlineStr = true;
        } 
        ...
}

public void endElement(String uri, String localName, String name){

        // rest of the code...

       if(name.equals("t") && inlineStr) {
          cellCache.add(lastContents.trim());
       }
       ...
}

void characters function in above factory class correctly identifies contents of the cell and my changes as given in my answer successfully fills cellCache list with all the values from inline string cells.

please refer to @Axel's answer in comment above and follow this answer for your source: How to check a number in a string contains a date and exponential numbers while parsing excel file using apache event model in java

Radheya
  • 779
  • 1
  • 11
  • 41