1

My file is 14GB and I would like to read line by line and will be export to excel file.

As the file include different language, such as Chinese and English,
I tried to use FileInputStream with UTF-16 for reading data,
but result in java.lang.OutOfMemoryError: Java heap space
I have tried to increase the heap space but problem still exist
How should I change my file reading code?

createExcel();     //open a excel file
try {

    //success but cannot read and output for different language
    //br = new BufferedReader(
    //        new FileReader("C:\\Users\\brian_000\\Desktop\\appdatafile.json"));


    //result in java.lang.OutOfMemoryError: Java heap space
    br = new BufferedReader(new InputStreamReader(
            new FileInputStream("C:\\Users\\brian_000\\Desktop\\appdatafile.json"), 
            "UTF-16"));

} catch (FileNotFoundException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
} catch (UnsupportedEncodingException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
} 

System.out.println("cann be print");


String line;
int i=0;
try {
    while ((line = br.readLine()) != null) {
        // process the line.
        try{
            System.out.println("cannot be print");
            //some statement for storing the data in variables.



                   //a function for writing the variable into excel
writeToExcel(platform,kind,title,shareUrl,contentRating,userRatingCount,averageUserRating
                            ,marketLanguage,pricing
                            ,majorVersionNumber,releaseDate,downloadsCount);


            }
            catch(com.google.gson.JsonSyntaxException exception){
                System.out.println("error");
            }



            // trying to get the first 1000rows
            i++;

            if(i==1000){
                br.close();

                break;
            }
        }
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }


    closeExcel();




public static void writeToExcel(String platform,String kind,String title,String shareUrl,String contentRating,String userRatingCount,String averageUserRating
            ,String marketLanguage,String pricing,String majorVersionNumber,String releaseDate,String downloadsCount){

        currentRow++;
        System.out.println(currentRow);

        if(currentRow>1000000){
            currentsheet++;
            sheet = workbook.createSheet("apps"+currentsheet, 0);
            createFristRow();
            currentRow=1;
        }



        try {

                //character id
                Label label = new Label(0, currentRow, String.valueOf(currentRow), cellFormat);
                sheet.addCell(label);

                //12 of statements for write the data to excel
                label = new Label(1, currentRow, platform, cellFormat);
                sheet.addCell(label);




            } catch (WriteException e) {
                e.printStackTrace();
            }
brian661
  • 538
  • 3
  • 11
  • 31
  • 2
    This code alone should not result in an OOM; please post the full code. Also, if you use Java 7+, please, please drop `File` and use java.nio.file. – fge Mar 04 '15 at 15:08
  • 1
    "As the file include different language, such as Chinese and English, I tried to use FileInputStream with UTF-16 for reading data" - well is the file *actually* in UTF-16? You shouldn't use an encoding without checking whether it's correct. And are you holding onto the lines you've read? – Jon Skeet Mar 04 '15 at 15:13
  • Is the error thrown here? br = new BufferedReader(new InputStreamReader(new FileInputStream("C:\\Users\\brian_000\\Desktop\\appdatafile.json"), "UTF-16")); – Jama Djafarov Mar 04 '15 at 15:18
  • I have edit my code, and I believe somethings wrong with the readLine(). I am not quiet sure is it is in UTF-16, just I have face and solve similar problem by reading as utf16, how can I check which encoding it is using? – brian661 Mar 04 '15 at 15:21
  • 1
    I think your file reading code is good. The problem will be the Excel file... what happens in `writeToExcel`? I guess the Excel data structure will grow in memory until the whole thing crashes. – Sky Mar 04 '15 at 15:22
  • @user3026820: To learn the encoding of the file, check the source: Ask the people who created it. If it's a web server, it should send a header with the encoding. If it's JSON, then the file must be `UTF-8` (defined by the standard). – Aaron Digulla Mar 04 '15 at 15:22
  • As my file is JSON, then I should use UTF-8. For the write to excel function, I have post to the question, but as for the first method of reading file, the excel file can be successfully be created, I believe it is okay – brian661 Mar 04 '15 at 15:28

1 Answers1

1

Excel, UTF-16

As mentioned, the problem is likely caused by the Excel document construction. Try whether UTF-8 yields a lesser size; for instance Chinese HTML still is better compressed with UTF-8 rather than UTF-16 because of the many ASCII chars.

Object creation java

You can share common small Strings. Useful for String.valueOf(row) and such. Cache only strings with a small length. I assume the cellFormat to be fixed.

DIY with xlsx

Excel builds a costly DOM. If CSV text (with a Unicode BOM marker) is no options (you could give it the extension .xls to be opened by Excel), try generating an xslx. Create an example workbook in xslx. This is a zip format you can process in java easiest with a zip filesystem. For Excel there is a content XML and a shared XML, sharing cell values with an index from content to shared strings. Then no overflow happens as you write buffer-wise. Or use a JDBC driver for Excel. (No recent experience on my side, maybe JDBC/ODBC.)

Best

Excel is hard to use with that much data. Consider more effort using a database, or write every N rows in a proper Excel file. Maybe you can later import them with java in one document. (I doubt it.)

Community
  • 1
  • 1
Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • The problem is solved after I changed to UTF-8. Would you mind also suggest a database software? It will be good if the data can be access just with a .db with the database software .exe – brian661 Mar 04 '15 at 16:17
  • I prefer embedded databases, stored as file, for a "small" isolated application. **h2database** comes to mind. – Joop Eggen Mar 04 '15 at 17:11