2

I wrote this Merge Sort which allow the user to call it by passing only two parameters (an ArrayList and a Comparator):

public static < T > void mergeSort(ArrayList < T > array, Comparator < T > c) {
    int high = array.size()-1;
    sort(array, c, 0, high, new ArrayList < T > (high/2));
  }  

  protected static < T > void sort(ArrayList < T > array, Comparator < T > c, int low, int high, ArrayList < T > tmp) {
    if (low < high) {
      int mid = low + (high - low) / 2;
      sort(array, c, low, mid, tmp);
      sort(array, c, mid + 1, high, tmp);
      merge(array, c, low, mid, high, tmp);
    }
  } 

  protected static < T > void merge(ArrayList < T > array, Comparator < T > c, int p, int mid, int q, ArrayList < T > tmp) {
    tmp.clear();
    int i = p;
    int j = mid + 1;
    int k = 0;
    for (; i <= mid && j <= q; k++) {
      if (c.compare(array.get(i), array.get(j)) < 0)
        tmp.add(k, array.get(i++));
      else
        tmp.add(k, array.get(j++));
    }
    if (i <= mid && j > q) {
      while (i <= mid)
        tmp.add(k++, array.get(i++));
    } else {
      while (j <= q)
        tmp.add(k++, array.get(j++));
    }
    for (k = 0; k < tmp.size()-p; k++)
      array.set(k + p, tmp.get(k));

  }
}

After calling it I tried to print its content (which I supposed to be order):

Sorting.mergeSort(arrayA, new LongComparator());
System.out.println(Arrays.toString(arrayA.toArray()));

But I got this error:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:3332)
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
    at java.lang.StringBuilder.append(StringBuilder.java:136)
    at java.util.Arrays.toString(Arrays.java:4574)

How can I improve my merge sort? Is the the temporary ArrayList the main cause of the error? Because this error happens when I try to order millions of data. With 2-3 elements it works. EDIT: This is my first version of the algorithm, which hasn't the support method with only two parameters that I need to do

public static < T > void sort(ArrayList < T > array, Comparator < T > c, int low, int high) {
    if (low < high) {
      int mid = low + (high - low) / 2;
      sort(array, c, low, mid);
      sort(array, c, mid + 1, high);
      merge(array, c, low, mid, high);
    }
  } 

@SuppressWarnings("unchecked")
  public static <T> void merge(ArrayList<T> array, Comparator<T> c, int p, int mid, int q) {
    Object[] tmp = new Object[q-p+1]; 
    int i = p;
    int j = mid+1;
    int k = 0;
    while (i <= mid && j <= q) {
        if (c.compare(array.get(i), array.get(j))<0)
          tmp[k] = array.get(i++);
        else
          tmp[k] = array.get(j++);
        k++;
    }
    if (i <= mid && j > q) {
        while (i <= mid) 
          tmp[k++] = array.get(i++);
    } else {
        while (j <= q)
          tmp[k++] = array.get(j++);
    }
    for (k = 0; k < tmp.length; k++)
      array.set(k+p, (T)tmp[k]);
  }
Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
JimBelushi2
  • 285
  • 1
  • 3
  • 18
  • 2
    It definitively is about some heap memory. I suspect that the JVM heap memory is large enough to hold one copy of your dataset, but not two. You can increase the heap size by starting your program with `java -Xmx2048m ...` (in this example, you give the VM 2 GB of heap memory). – Turing85 May 05 '18 at 09:34
  • 1
    Also keep in mind that every recursive algorithm is eventually suspect to `StackOverflowException`s since neither the AoT- nor the hotspot compiler deploy tail call optimization. – Turing85 May 05 '18 at 09:36
  • In my first version I was declaring the temporary ArrayList into the merge function and It worked without any problem even on 20 millions of integer. The problem is that now I need to move the declaration into the sort method, and I think that the problem is caused by that. – JimBelushi2 May 05 '18 at 09:42
  • recursion is not the best choice, for this "bunch of" data. – xerx593 May 05 '18 at 09:45
  • ..and the core problem: You cannot assign more than `2^31-"a few"`elements to an Array(List).(https://stackoverflow.com/a/3039805/592355, https://stackoverflow.com/a/44546701/592355) – xerx593 May 05 '18 at 09:53
  • try [iterative](https://www.geeksforgeeks.org/iterative-merge-sort/) ..and/or consider `LinkedList` :) – xerx593 May 05 '18 at 10:02
  • @xerx593 OP taskl about 20 Million (`20_000_000`) elements. `2^31` is two magnitudes larger (`2_147_483_648`). The `Exception` would contain some information regarding maximum array size aswell. – Turing85 May 05 '18 at 10:02
  • I tried to print all the elements on a small file and it seems that the algorithm doesn't works. I'm adding the first version in the question (which works, but that hasn't the support method with only two parameters that I need to do) – JimBelushi2 May 06 '18 at 09:44
  • I think you could also have a chance of solving this by posting on the Code Review StackExchange – Yassin Hajaj May 06 '18 at 10:45
  • What happens if you try to print it without sorting? – rustyx May 06 '18 at 11:00

1 Answers1

0

Two ways to approach the problem:

Increase available memory: As was mentioned by Turing85, run java with more memory using VM option, -Xmx2048m to assign 2GB for example.

Decrease the memory used: Using ArrayList of basic types like Long and Double uses 4 times (in my simple experiment) as much memory as using the equivalent array of the basic types long / double.

ArrayList<Long> instead of long[]

will also make the code run significantly slower for several reasons (if you are aiming to use the merge sort for non-basic types you may see performance increase but memory gain will not be as significant)

DHa
  • 659
  • 1
  • 6
  • 21