7

So I'm trying to write a program that scans for a specific pattern in a row of an excel file. Namely for an N followed by any letter, then an S or a T (with each letter occupying a single cell).

The problem is, the excel file that I'm using is absolutely massive, with roughly 3000 rows and nearly 1000 columns. I'm trying to search for this pattern only in the first 60 rows in order to reduce java heap space. How can I suit my algorithm to do this? I'm still getting out of memory exceptions.

My code is as follows:

import java.awt.List;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.ArrayList;

import org.apache.poi.EncryptedDocumentException;
import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.usermodel.WorkbookFactory;
import org.apache.poi.xssf.usermodel.XSSFCell;
import org.apache.poi.xssf.usermodel.XSSFRow;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

public class ExcelReader {

    public int Reader(File file) throws IOException, EncryptedDocumentException, InvalidFormatException {
        FileInputStream fis = new FileInputStream(file);
        String filepath = file.getPath();
        Workbook wb = WorkbookFactory.create(new File(filepath));
        XSSFSheet sheet = (XSSFSheet) wb.getSheetAt(0);
        XSSFRow row;
        XSSFCell cell;
        ArrayList<Integer> list = new ArrayList<Integer>();

        int rows;
        int cols = 0;
        int temp = 0;
        rows = sheet.getPhysicalNumberOfRows();

        for (int i = 0; i < 10 || i < 60; i++) {
            row = sheet.getRow(i);
            if (row != null) {
                temp = sheet.getRow(i).getPhysicalNumberOfCells();
                if (temp > cols)
                    cols = temp;
            }
        }
        for (int r = 0; r <= 60; r++) {
            row = sheet.getRow(r);
            if (row != null) {
                for (int c = 0; c <= cols; c++) {
                    int numblanks = 0;
                    cell = row.getCell((short) c);
                    if (cell != null) {
                        //System.out.print(cell + "\t\t");
                    } else {
                        //System.out.print("\t\t");
                    }
                    if (cell != null && cell.getCellType() == XSSFCell.CELL_TYPE_STRING) {
                        if ("N".equals(cell.getStringCellValue())) {
                            for (int k = c; k <= cols; k++) {
                                if ("-".equals(row.getCell(k).getStringCellValue())) {
                                    numblanks++;
                                    continue;
                                }
                                if ("S".equals(row.getCell(c + 2 + numblanks).getStringCellValue())
                                        || "T".equals(row.getCell(c + 2 + numblanks).getStringCellValue())) {
                                    list.add((int) sheet.getRow(1).getCell(c).getNumericCellValue());
                                    break;
                                }
                            }
                        }
                    }
                }
                System.out.println();
            }
        }
        System.out.println();
        System.out.println("Rows: " + rows);
        System.out.println("Columns: " + cols);
        System.out.println(list);
        return temp;
    }
}
jmc1094
  • 135
  • 1
  • 11
  • 2
    Can you not convert it to a CSV and then just arbitrarily read n bytes from the file? – user1231232141214124 Jan 28 '16 at 19:04
  • 1
    In VBA you can use `ExecuteExcel4Macro("'" & path & "[" & file & "]" & sheet & "'!" & range)` to get values without opening the file... But I doubt there is anything like that in java... but maybe you can use some converting or create an "autorun"-workbook which uses this to create a copy for just a part of the original file... Just an idea. – Dirk Reichel Jan 28 '16 at 19:34
  • 1
    Which line is giving you the problem??? And do you mean to do this 10 or 60 times :: (int i = 0; i < 10 || i < 60; i++) ??? And does "-" really count blanks or hyphens??? – donPablo Jul 21 '18 at 04:10
  • maybe the find method could help you https://learn.microsoft.com/en-us/office/vba/api/excel.range.find – Wouter Oct 18 '18 at 09:33
  • 1
    I don't know if you're aware of this, but an *.xlsx file is in fact a zipped XML structure, but it's quite difficult to understand. If you manage to understand how this works, you might launch some `XPath` queries on that structure and get your information easier. – Dominique Nov 30 '18 at 14:00
  • Why don't you save of copy of the first 60 rows? – simple-solution Jan 25 '19 at 23:15
  • Use `ACE.OLEDB` and query your Excel file. – Michael Z. Apr 27 '19 at 04:20
  • https://stackoverflow.com/questions/33786219/apache-poi-streaming-sxssf-for-reading please check this question and answer – user1516873 Oct 16 '19 at 11:19

1 Answers1

0

Convert to CSV file which is easy enough to do. If possible I would do an insert of the data into a database table and use a procedure to search and find what you're looking for. This can be done using Spring Batch and Java