0

I am parsing an excel file which contains many dates like 13-4-2021 and some numbers in this 3,7%,2,65% format.So i am parsing that excel file and i am getting the data in a string to write them in a text file.So my problem is that i am getting the date in a whole number like 44299, while it is actually in 04/13/2021 format in the excel sheet.And another case is i have some numbers with percentage like 3,7%,2,65% which are coming like 3.6999999999999998E-2.So i can convert the number to a date using

SimpleDateFormat("MM/dd/yyyy").format(javaDate)

Here is the code i am using

private static class SheetHandler extends DefaultHandler {
    private SharedStringsTable sst;
    private String lastContents;
    private boolean nextIsString;
    private int rowNumber;

    private SheetHandler(SharedStringsTable sst) {
        this.sst = sst;
    }

    public void startElement(String uri, String localName, String name,
            Attributes attributes) throws SAXException {
        try {

            // row => row
            if(name.equals("row")) {
                   if (attributes.getValue("r") != null) {
                    rowNumber = Integer.valueOf(attributes.getValue("r"));
                   } else {
                    rowNumber++;
                   }
                   //System.out.println("row: " + rowNumber);
                  }

        if (rowNumber > 6) {

        // c => cell
        if(name.equals("c")) {
            // Print the cell reference 

            //System.out.print(attributes.getValue("r") + " - ");
            // Figure out if the value is an index in the SST
            String cellType = attributes.getValue("t");
            if(cellType != null && cellType.equals("s")) {
                nextIsString = true; 
            } else {
                nextIsString = false;
              }

        }
        // Clear contents cache
        lastContents = "";
        }
        }catch(Exception e) {
            e.printStackTrace();
        }
    }

    public void endElement(String uri, String localName, String name)
            throws SAXException {
        // Process the last contents as required.
        // Do now, as characters() may be called more than once

        if (rowNumber > 6) {


        if(nextIsString) {
            int idx = Integer.parseInt(lastContents);

            lastContents = new XSSFRichTextString(sst.getEntryAt(idx)).toString();
            nextIsString = false;
        }
        // v => contents of a cell
        // Output after we've seen the string contents
        if(name.equals("v")) {
           // System.out.println(lastContents);

            if(!lastContents.isEmpty() ) // Here i am putting the values to a list to process 

                pickUpExcelValues.add(lastContents);
            }
        }
    }

    public void characters(char[] ch, int start, int length)
            throws SAXException {
        lastContents += new String(ch, start, length);
    }
}

But how i will check the string contains lets say 44299 is a date? And also i have no idea how to convert this 3.6999999999999998E-2 to 3,7% while writing to a text file.If anybody have any idea please help.

Axel Richter
  • 56,077
  • 6
  • 60
  • 87
Mandrek
  • 1,159
  • 6
  • 25
  • 55
  • Help us to help you; **post the java code that processes the Excel file**. – Gary's Student Feb 17 '18 at 16:53
  • @Gary'sStudent please check the post i have edited – Mandrek Feb 17 '18 at 16:59
  • @Gary'sStudent can you please check now ? – Mandrek Feb 18 '18 at 11:53
  • I have reviewed your code. I am attempting to devise a way to retrieve a **formatted** cell. It may take some time time – Gary's Student Feb 18 '18 at 11:55
  • No problem I thought that you haven't got time to look – Mandrek Feb 18 '18 at 11:56
  • @Gary'sStudent if you can manage the date field that is also ok , i dont need the exponential part for now – Mandrek Feb 19 '18 at 08:37
  • 1
    Is that just me not understanding your question? If your number is between 0 and 1, then it’s a percentage (which isn’t the same as exponential). If it’s an integer, then it may be number of days since December 30, 1899; but you should decide a reasonable range for your dates and discard those that fall outside. – Ole V.V. Feb 19 '18 at 09:52
  • Similarly to what @OleV.V. suggested, I'd check if the string contains a dot (or an `E`) - if it does, you need to round it to whatever decimal degree you're interested in. Otherwise, it's a date in a constant format – GalAbra Feb 19 '18 at 10:02
  • @OleV.V. i am getting the dates as a number like 43256,46586. So before putting them into the list i need to convert them into their original form – Mandrek Feb 19 '18 at 10:02
  • @OleV.V. i have not understood your solution , can you write some code for that if possible? – Mandrek Feb 19 '18 at 10:04
  • @GalAbra am i able to explain my situation ? – Mandrek Feb 19 '18 at 10:08
  • You shouldn’t want to use `SimpleDateFormat`. That class is not only long outdated, it is also notoriously troublesome. I recommend you use [`java.time`, the modern Java date and time API,](https://docs.oracle.com/javase/tutorial/datetime/) instead. And its `DateTimeFormatter` class. – Ole V.V. Feb 19 '18 at 13:10

3 Answers3

1

This question needs further explanations.

At first, it is related to How to skip the rows in a xlsm file using apache event user model which was answered.

But if one wants using the examples from XSSF and SAX (Event API) then one needs basic knowledge about the XML used in the Office Open XML.

And the ExampleEventUserModel is a very low level example showing the streaming principle. For extending this to taking formats into account needs parsing the styles table too and then using DataFormatter.

The following is a complete example which is doing exactly this. But there is a fuller example, including support for fetching number formatting information and applying it to numeric cells (eg to format dates or percentages). Please see the XLSX2CSV example in svn.

import java.io.InputStream;
import java.util.Iterator;

import org.apache.poi.xssf.eventusermodel.XSSFReader;
import org.apache.poi.xssf.model.SharedStringsTable;
import org.apache.poi.xssf.model.StylesTable;
import org.apache.poi.xssf.usermodel.XSSFRichTextString;
import org.apache.poi.xssf.usermodel.XSSFCellStyle;
import org.apache.poi.ss.usermodel.DataFormatter;
import org.apache.poi.ss.usermodel.BuiltinFormats;

import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.util.SAXHelper;
import javax.xml.parsers.ParserConfigurationException;

import org.xml.sax.Attributes;
import org.xml.sax.ContentHandler;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;


public class ExampleEventUserModel {
 public void processOneSheet(String filename) throws Exception {
  OPCPackage pkg = OPCPackage.open(filename);
  XSSFReader r = new XSSFReader( pkg );
  SharedStringsTable sst = r.getSharedStringsTable();

  StylesTable st = r.getStylesTable();
  XMLReader parser = fetchSheetParser(sst, st);

  // To look up the Sheet Name / Sheet Order / rID,
  //  you need to process the core Workbook stream.
  // Normally it's of the form rId# or rSheet#
  InputStream sheet2 = r.getSheet("rId2");
  InputSource sheetSource = new InputSource(sheet2);
  parser.parse(sheetSource);
  sheet2.close();
 }

 public void processAllSheets(String filename) throws Exception {
  OPCPackage pkg = OPCPackage.open(filename);
  XSSFReader r = new XSSFReader( pkg );
  SharedStringsTable sst = r.getSharedStringsTable();

  StylesTable st = r.getStylesTable();
  XMLReader parser = fetchSheetParser(sst, st);

  Iterator<InputStream> sheets = r.getSheetsData();
  while(sheets.hasNext()) {
   System.out.println("Processing new sheet:\n");
   InputStream sheet = sheets.next();
   InputSource sheetSource = new InputSource(sheet);
   parser.parse(sheetSource);
   sheet.close();
   System.out.println("");
  }
 }

 public XMLReader fetchSheetParser(SharedStringsTable sst, StylesTable st) throws SAXException, ParserConfigurationException {
/*
  XMLReader parser =
  XMLReaderFactory.createXMLReader(
       "org.apache.xerces.parsers.SAXParser"
  );
*/
  XMLReader parser = SAXHelper.newXMLReader();
  ContentHandler handler = new SheetHandler(sst, st);
  parser.setContentHandler(handler);
  return parser;
 }


 /** 
  * See org.xml.sax.helpers.DefaultHandler javadocs 
  */
 private static class SheetHandler extends DefaultHandler {
  private SharedStringsTable sst;
  private StylesTable st;
  private String lastContents;
  private boolean nextIsString;
  private boolean nextIsStyledNumeric;
  private boolean inlineStr;
  private int styleIndex;
  private DataFormatter formatter;

  private int rowNumber;

  private SheetHandler(SharedStringsTable sst, StylesTable st) {
   this.sst = sst;
   this.st = st;
   this.rowNumber = 0;
   this.formatter = new DataFormatter(java.util.Locale.US, true);
   this.styleIndex = 0;
  }

  public void startElement(String uri, String localName, String name,
            Attributes attributes) throws SAXException {

   // row => row
   if(name.equals("row")) {
    if (attributes.getValue("r") != null) {
     rowNumber = Integer.valueOf(attributes.getValue("r"));
    } else {
     rowNumber++;
    }
    System.out.println("row: " + rowNumber);
   }

   if (rowNumber > 6) {

    // c => cell
    if(name.equals("c")) {
     // Print the cell reference
     System.out.print(attributes.getValue("r") + " - ");

     String cellType = attributes.getValue("t");

     // Figure out if the value is an index in the SST
     nextIsString = false;
     if(cellType != null && cellType.equals("s")) {
      nextIsString = true;
     } 

     // Figure out if the value is an inline string     
     inlineStr = false;
     if(cellType != null && cellType.equals("inlineStr")) {
      inlineStr = true;
     } 

     // Figure out if the value is an styled numeric value or date
     nextIsStyledNumeric = false;
     if(cellType != null && cellType.equals("n") || cellType == null) {
      String cellStyle = attributes.getValue("s");
      if (cellStyle != null) {
       styleIndex = Integer.parseInt(cellStyle);
       nextIsStyledNumeric = true;
      }
     } 
    }
   }

   // Clear contents cache
   lastContents = "";
  }

  public void endElement(String uri, String localName, String name)
            throws SAXException {
   if (rowNumber > 6) {

    // Process the last contents as required.
    // Do now, as characters() may be called more than once

    // If the value is in the shared string table, get it
    if(nextIsString) {
     int idx = Integer.parseInt(lastContents);
     lastContents = new XSSFRichTextString(sst.getEntryAt(idx)).toString();
     nextIsString = false;
    }

    // v => contents of a cell
    // Output after we've seen the string contents
    if(name.equals("v") || (inlineStr && name.equals("c"))) {
     // If the value is styled numeric, use DataFormatter to formaat it
     if (nextIsStyledNumeric) {
      XSSFCellStyle style = st.getStyleAt(styleIndex);
      int formatIndex = style.getDataFormat();
      String formatString = style.getDataFormatString();
      if (formatString == null) {
       // formatString could not be found, so it must be a builtin format.
       formatString = BuiltinFormats.getBuiltinFormat(formatIndex);
      }
      double value = Double.valueOf(lastContents);
      lastContents = formatter.formatRawCellContents(value, formatIndex, formatString);
      nextIsStyledNumeric = false;
     } 
     // Print out the contents
     System.out.println(lastContents);
    }
   }
  }

  public void characters(char[] ch, int start, int length)
            throws SAXException {
   //collect each character part to the content
   lastContents += new String(ch, start, length);
  }
 }

 public static void main(String[] args) throws Exception {
  ExampleEventUserModel example = new ExampleEventUserModel();
  //example.processOneSheet(args[0]);
  example.processAllSheets(args[0]);
 }
}
Axel Richter
  • 56,077
  • 6
  • 60
  • 87
0

This function wraps the two cases (a percentage or a date):

private static String convert(String s) {
    if(s.indexOf('E') > -1) {   // It's a percentage
        String[] components = s.split("E");
        double num = Double.parseDouble(components[0]) * Math.pow(10, Integer.parseInt(components[1]));
        //return String.valueOf(num);                   // will return i.e. "0.037"
        return Math.round(num * 10000.0) / 100.0 + "%"; // will return i.e. "3.7%"
    }
    else {  // It's a date
        SimpleDateFormat sdf = new SimpleDateFormat("MM/dd/yyyy");
        GregorianCalendar gc = new GregorianCalendar(1900, 0, 0);
        gc.add(Calendar.DATE, Integer.parseInt(s) - 1);
        Date date = gc.getTime();
        return sdf.format(date);
    }
}

Note that the serial number dates in Excel represent the days that have elapsed since January 1st 1900, hence the conversion I've used.

Let me know how it works for you

GalAbra
  • 5,048
  • 4
  • 23
  • 42
  • Yes, I had something like that in mind. Elegant rounding. Can we be sure that a percentage would have an `E` in it? It could be 0? It could have a lowercase `e`? And as I have said already, I discourage `SimpleDateFormat` and `GregorianCalendar`. – Ole V.V. Feb 19 '18 at 13:14
  • 1
    @OleV.V. Thanks for your comment! As explained [here](https://support.office.com/en-us/article/Display-numbers-in-scientific-exponential-notation-f85a96c0-18a1-4249-81c3-e934cd2aae25), the capital `E` is always used with numbers with more than 10 digits. More generally, I suggest you read more about how [Excel implements the IEEE 754 standard](https://superuser.com/questions/413226/why-does-excel-treat-long-numeric-strings-as-scientific-notation-even-after-chan). Good luck! – GalAbra Feb 19 '18 at 13:39
  • @GalAbra and what about the other numbers which are not date your code is going to convbert everything in date – Mandrek Feb 22 '18 at 13:29
  • @Mandrek You've defined the inputs as either decimal fractions (that represent percentages) or dates. If you want to allow integers, you need to add a restriction for a maximal integer (above which the input will be dealt with as a date) – GalAbra Feb 22 '18 at 13:32
  • @GalAbra in the else condition you are assuming that input is an integer , but that is not the case i am getting a number or lets say dauble which is 43567 and which is a date , i have to decide that is a date first then ave to covert – Mandrek Feb 22 '18 at 13:38
  • @Mandrek That's right, I meant that you have to define a condition to decide whether your input is a date or something else – GalAbra Feb 22 '18 at 13:45
  • @GalAbra can you give a solution of that ? – Mandrek Feb 23 '18 at 05:39
  • @Mandrek I’ve already mentioned that *you* should specify the condition, because only you know the expected inputs – GalAbra Feb 23 '18 at 07:31
0

It seems to me that you can distinguish your strings from Excel in a way similar to this:

private static void checkNumber(String fromExcel) {
    try {
        double asNumber = Double.parseDouble(fromExcel);
        if (asNumber >= 0 && asNumber <= 1) {
            System.out.println("Percentage: " + asNumber * 100 + " %");
        }
        long asWholeNumber = Math.round(asNumber);
        try {
            LocalDate asDate = LocalDate.of(1899, Month.DECEMBER, 30)
                    .plusDays(asWholeNumber);
            if (asDate.isAfter(LocalDate.of(2000, Month.DECEMBER, 31)) 
                    && asDate.isBefore(LocalDate.of(2035, Month.JANUARY, 1))) {
                System.out.println("Date: " + asDate);
            }
        } catch (DateTimeException dte) {
            System.out.println("Unidentified: " + fromExcel);
        }
    } catch (NumberFormatException nfe) {
        System.out.println("Unidentified: " + fromExcel);
    }
}

Try this method out:

    checkNumber("44299");
    checkNumber("3.6999999999999998E-2");

This prints:

Date: 2021-04-13
Percentage: 3.6999999999999997 %

That there are two possible interpretations for the strings should not keep you from doing a validation and catching strings that do not conform with either interpretation, so I am trying some filtering of each case. Please note that if you include 1899 in the accepted dates, "0" and "1" will be accepted both as dates and as percentages.

I am using and recommending LocalDate from java.time, the modern Java date and time API, for handling dates. The modern API is so much nicer to work with then the outdated Date and GregorianCalendar.

Gary’ Student’s idea of getting a formatted cell might be a more correct way to go.

Question: My code needs to be compatible with Java 6; can I use java.time?

EDIT: Yes, java.time can work nicely in Java 6.

  • In Java 8 and later and on new Android devices (from API level 26, I’m told) the new API comes built-in.
  • In Java 6 and 7 get the ThreeTen Backport, the backport of the new classes (ThreeTen for JSR 310, where the modern API was first described). EDIT 2: Make sure you import the date and time classes and exceptions from package org.threeten.bp and subpackages.
  • On (older) Android, use the Android edition of ThreeTen Backport. It’s called ThreeTenABP. Also here make sure you import the date and time classes from package org.threeten.bp and subpackages.

Links

Ole V.V.
  • 81,772
  • 15
  • 137
  • 161
  • V.V is this for java 8 ? I need a java 6 compatible code – Mandrek Feb 22 '18 at 12:28
  • V.V have you edited the code? coz i think LocalDate is a problem which is not working in java 6 – Mandrek Feb 22 '18 at 12:41
  • V.V i can not see any change in the code what you have edited ? – Mandrek Feb 22 '18 at 12:44
  • No code change is needed for Java 6 (so I haven’t made any). I have added a couple of sections at the bottom explaining. – Ole V.V. Feb 22 '18 at 12:47
  • i got this exception Activity number 3 ready to be processed Following are the input parameters: 1. inputFile= E:\Oracle\FDMEEDATA\MANREP/inbox/MREP_File_Dummy/NNLife-January2018MarketValueReport.xlsx 2. outputFile= E:\Oracle\FDMEEDATA\MANREP/inbox/MREP_File_Dummy/out.txt 3. monthName =Dec'17 Exception in thread "main" java.lang.NoClassDefFoundError: java/time/DateTimeException at com.excelmanager.ActivityLauncher.main(ActivityLauncher.java:68) – Mandrek Feb 22 '18 at 12:47
  • V.V then why i am getting DateTimeException ? – Mandrek Feb 22 '18 at 12:48
  • Ah, when using the backport, you need to import from `org.threeten.bp`, for example `import org.threeten.bp.DateTimeException;`. I have edited again (you were not getting `DateTimeException`, you were getting `NoClassDefFoundError`, but that certainly isn’t better). – Ole V.V. Feb 22 '18 at 12:53
  • getting this java.lang.NoClassDefFoundError: java/time/chrono/ChronoLocalDate – Mandrek Feb 22 '18 at 13:06
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/165647/discussion-between-ole-v-v-and-mandrek). – Ole V.V. Feb 22 '18 at 13:07
  • V.V i am compiling using java 6 so i saw it comes only when you don't have java 8 – Mandrek Feb 23 '18 at 06:35
  • I am still [in the chat](http://chat.stackoverflow.com/rooms/165647/discussion-between-ole-v-v-and-mandrek) (checking back a couple of times a day for a day or two still). – Ole V.V. Feb 23 '18 at 09:45