1

I had this "java.lang.OutOfMemoryError: Java heap space " and I read and understand that I can increase my memory using -Xmx1024m. But I think in my code I can change something to this error does not happen anymore.

First, this is the image from VisualVM about my memory :

enter image description here

In the image you can see that the object "Pedidos" is not so big and I have the another object "Enderecos" that have more and less the same size but is not complete because I have the error before the object is completed.

The point is :

  • I have 2 classes that search for a big csv file ( 400.000 values each ), I will show the code. I tried to use Garbage Collector, set variables as null, but is not working, can anyone help me please? Here is the Code from the class "Pedidos", the class "Enderecos" is the same and my project is just calling this 2 classes.


// all Imports
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.io.Reader;
import java.util.ArrayList;
import javax.swing.JOptionPane;
import Objetos.Pedido;

// CLASS
public class GerenciadorPedido{
    // ArrayList I will add all the "Pedidos" Objects
    ArrayList<Pedido> listaPedidos = new ArrayList<Pedido>();

    // Int that I need to use the values correctly
    int helper;

    // I create this global because I didnt want to create a new String everytime the for is running (trying to use less memory)
    String Campo[];
    String Linha;
    String newLinha;

    public ArrayList<Pedido> getListaPedidos() throws IOException {


        // Here I change the "\" and "/" to be accepted be the FILE (the csv address) 
        String Enderecotemp = System.getProperty("user.dir"), Endereco = "";
        char a;
        for (int i = 0; i < Enderecotemp.length(); i++) {
            a = Enderecotemp.charAt(i);
            if (a == '\\') a = '/';
            Endereco = Endereco + String.valueOf(a);
        }
        Endereco = Endereco + "/Pedido.csv";


        // Open the CSV File and the reader to read it
        File NovoArquivo = new File(Endereco);
        Reader FileLer = null;

        // Try to read the File
        try
        {
            FileLer = new FileReader(NovoArquivo);
        }

        catch(FileNotFoundException e) {
            JOptionPane.showMessageDialog(null, "Erro, fale com o Vini <Arquivo de Pedido Não Encontrado>");
        }

        // Read the File
        BufferedReader Lendo = new BufferedReader(FileLer);
        try
        {
            // Do for each line of the csv
            while (Lendo.ready()) {

                // Read the line and replace the caracteres ( needed to funcionality works )
                Linha = Lendo.readLine();
                newLinha = Linha.replaceAll("\"", "");
                newLinha = newLinha.replaceAll(",,", ", , ");
                newLinha = newLinha.replaceAll(",,", ", , ");
                newLinha = newLinha + " ";

                // Create Campo[x] for each value between ","
                Campo = newLinha.split(",");

                // Object 
                Pedido pedido = new Pedido();

                helper = 0;

                // Just to complete the object with the right values if the Campo.length have 15, 16, 17, 18 or 19 of size. 
                switch (Campo.length) {
                    case 15: pedido.setAddress1(Campo[9]);
                        break;
                    case 16: pedido.setAddress1(Campo[9] + Campo[10]);
                        helper = 1;
                        break;
                    case 17: pedido.setAddress1(Campo[9] + Campo[10] + Campo[11]);
                        helper = 2;
                        break;
                    case 18: pedido.setAddress1(Campo[9] + Campo[10] + Campo[11] + Campo[12]);
                        helper = 3;
                        break;
                    case 19: pedido.setAddress1(Campo[9] + Campo[10] + Campo[11] + Campo[12] + Campo[13]);
                        helper = 4;
                        break;
                }

                // Complete the Object
                pedido.setOrder(Campo[0]);
                pedido.setOrderValue(Float.parseFloat(Campo[1]));
                pedido.setOrderPv(Float.parseFloat(Campo[2]));
                pedido.setCombinedOrderFlag(Campo[3]);
                pedido.setCombineOrder(Campo[4]);
                pedido.setOrderType(Campo[5]);
                pedido.setOrderShipped(Campo[6]);
                pedido.setOrderCancelled(Campo[7]);
                pedido.setTransactionType(Campo[8]);
                pedido.setAddress2(Campo[10 + helper]);
                pedido.setAddress3(Campo[11 + helper]);
                pedido.setPost(Campo[12 + helper]);
                pedido.setCity(Campo[13 + helper]);
                pedido.setState(Campo[14 + helper]);

                // Add the object in the ArrayList
                listaPedidos.add(pedido);

                // Set everything to null to start again
                Campo = null;
                Linha = null;
                newLinha = null;
            }
        }

        catch(IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        finally
        {
            // Close the file and run garbage collector to try to clear the trash
            Lendo.close();
            FileLer.close();
            System.gc();
        }
        // return the ArrayList.
        return listaPedidos;
    }
}

The project runs this class, but when the project try to run the another ( the same as this one, changing just names and the csv ), I have the memory Error. I don't know how can I clear this char[] and String that is to big as can you see on the image. Any new Ideas ? Is really impossible without increase the memory ?

trincot
  • 317,000
  • 35
  • 244
  • 286
Vinicius Martin
  • 189
  • 1
  • 3
  • 15
  • what are the size (in KB) of the flat files? – user1231232141214124 Oct 20 '15 at 15:53
  • 1
    Are you talking about 400k *lines* of csv? If you do, and each line becomes about 10 strings inside a "pedido", then 5 million string instances seems pretty reasonable. – JimmyB Oct 20 '15 at 15:55
  • By the way, which Java version are you using? [pre 7u6 appears to have an issue with substring](http://stackoverflow.com/questions/14161050/java-string-substring-method-potential-memory-leak). – JimmyB Oct 20 '15 at 15:55
  • You create an object that contains 14 strings for each line and keep it in a list. The list and every object in it isn't garbage collected. Your needed heap size will be roughly the size of the csv plus some overhead. – André Stannek Oct 20 '15 at 15:56
  • 42000kbs to each file – Vinicius Martin Oct 20 '15 at 15:56
  • `// I create this global because I didnt want to create a new String everytime` -- Doesn't work this way. `Linha = Lendo.readLine();` creates a new string and *assigns* it to the global variable. Nothing can be gained here. Same for `Campo = newLinha.split(",");`: `split` creates a new array each time. No way to avoid that. – JimmyB Oct 20 '15 at 16:00
  • java version 1.8.0_40 – Vinicius Martin Oct 20 '15 at 16:00
  • @HannoBinder In my head, set as null the Campo everytime, will not erase the old split ? – Vinicius Martin Oct 20 '15 at 16:05
  • @HannoBinder , in the image I have the object size ( the old splits and strings ), after the moment I put everything on the object and set as null all the variables, shouldnt the char[] and String memory clear ? – Vinicius Martin Oct 20 '15 at 16:08
  • `Campo = newLinha.split(",");` will just aswell release the reference to the old `Campo`. If you don't need big amounts of memory between `Campo = null` and `Campo = xyz;` then there's no difference. – JimmyB Oct 20 '15 at 16:08
  • 1
    @ViniciusMartin you set `Campo = null` but before you add strings from that array to `pedido`. Those references are kept. – André Stannek Oct 20 '15 at 16:09
  • @AndréStannek So in the image, when I see that the char[] has 5 million instances and the object Pedido has 345k . This 345k * 20 ( Campo[x] can go until 20 ) memory, is inside both Pedido and Char[] ? – Vinicius Martin Oct 20 '15 at 16:13
  • @ViniciusMartin Every string holds a `char[]` internally, so you'd expect at least as many `char[]` instances as you have strings. – JimmyB Oct 20 '15 at 16:16
  • don't capitalize variable names, it makes things confusing and goes against best practices – user1231232141214124 Oct 20 '15 at 16:17
  • I see you setting 14 to 15 strings to each pedido object. `15*345770=5186550` so this sounds plausible. – André Stannek Oct 20 '15 at 16:18
  • @HannoBinder so no way to clear this memory used just to send the String from the csv to the objects ? :( – Vinicius Martin Oct 20 '15 at 16:20
  • 1
    @AndréStannek Two of those strings become floats, so it's rather 12-13 strings, but still very plausible. – JimmyB Oct 20 '15 at 16:21
  • @HannoBinder oh, did miss that. Even more plausible since that number is lower than the total number of strings. – André Stannek Oct 20 '15 at 16:23
  • @ViniciusMartin No, probably not. But that's actually good news: You don't have to care about releasing any memory or object references. We verified that memory management and GC do what they're supposed to do and you can focus more on your algorithm. – JimmyB Oct 20 '15 at 16:23
  • 2
    If you cannot/don't want to spend that much memory, you should think about how many "pedidos" you need at the same time. Maybe you can read+process one after the other instead of reading 400k objects and then processing all of them in one go. – JimmyB Oct 20 '15 at 16:28
  • @HannoBinder got it. Thank you so much all this help, I will increase the memory to run the project. :) – Vinicius Martin Oct 20 '15 at 16:28
  • @HannoBinder Yes, I will find a way to use each pedido object just the time I really need each one. – Vinicius Martin Oct 20 '15 at 16:29

1 Answers1

2

As is being discussed in the comments already, the main factor is your program places everything in memory at the same time. That design will inherently limit the size of the files you can process.

The way garbage collection works is that only garbage is collected. Any object that is referenced by another is not garbage. So, starting with the "root" objects (anything declared as static or local variables currently on the stack), follow the references. Your GerenciadorPedido instance is surely referenced from main(). It references a list listaPedidos. That list references (many) instances of Pedido each of which references many String instances. Those objects will all remain in memory while they are reachable through the list.

The way to design your program so it doesn't have a limit on the size of the file it can process is to eliminate the list entirely. Don't read the entire file and return a list (or other collection). Instead implement an Iterator. Read one line from the CSV file, create the Pedido, return it. When the program is finished with that one, then read the next line and create the next Pedido. Then you will have only one of these objects in memory at any given time.

Some additional notes regarding your current algorithm:

  • every String object references a char[] internally that contains the characters

  • ArrayList has very poor memory usage characteristics when adding to a large list. Since it is backed by an array, in order to grow to add the new element, it must create an entirely new array larger than the current one then copy all of the references. During the process it will use double the memory that it needs. This also becomes increasingly slower the larger the list is.

    • One solution is to tell the ArrayList how large you will need it to be so you can avoid resizing. This is only applicable if you actually know how large you will need it to be. If you need 100 elements: new ArrayList<>(100).

    • Another solution is to use a different data structure. A LinkedList is better for adding elements one at a time because it does not need to allocate and copy an entire array.

  • Each call to .replaceAll() will create a new char[] for the new String object. Since you then orphan the previous String object, it will get garbage collected. Just be aware of this need for allocation.

  • Each string concatenation (eg newLinha + " " or Campo[9] + Campo[10]) will create a new StringBuilder object, append the two strings, then create a new String object. This, again, can have an impact when repeated for large amounts of data.

  • You should, in general, never need to call System.gc(). It is okay to call it, but the system will perform garbage collection whenever memory is needed.

One addtional note: your approach to parsing the CSV will fail when the data contains characters you aren't expecting. In particular if any of the fields were to contain a comma. I recommend using an existing CSV parsing library for a simple solution to correctly handling the entire definition of CSV. (I have successful experience using opencsv)

dsh
  • 12,037
  • 3
  • 33
  • 51