4

Say an ArrayList is of size n.

In my case, I often need to remove from 1 to n elements with different indexes from an ArrayList.

By using visualvm profiler, I found the ArrayList.remove() took around 90% of the running time.

So I want to improve the performance of the removal. I wonder if it could be accelerated.

Here is a minimal example:

public void testArrayListRemove() {
        List list = new ArrayList();
        int[] indexes = new int[] { 1, 2, 4, 10, 100, 1000 };
        for (int i = 0; i < 100000; i++) {
            list.add(i);
        }
        for (int i = indexes.length - 1; i >= 0; i--) {
            list.remove(indexes[i]);
        }
    }

The idea I can think of is to exchange those to be removed elements to the end and remove them there so that ArrayList.remove() do not need to make system.arraycopy. I am not sure whether this will really work.

Note: ArrayList.remove(i) when i is not the last element, it will perform a System.arraycopy to move elements.

It would be very appreciated if you can provide ideas to deal with my problem. You can either comment on my naive idea of exchanging elements to the end or maybe even better provide more advanced algorithms other than my idea.

Thanks.

Changwang Zhang
  • 2,467
  • 7
  • 38
  • 64
  • 4
    _"I am not sure whether this will really work"_ Try it, perhaps? – Baby Jun 06 '14 at 08:08
  • 2
    Did not try this, but have you tried creating a new array list (with initial size equal to original size minus number of elements to be removed) and adding the elements you want to keep? (Of course whether this helps depends on the ratio of elements to be removed and to be retained.) – tobias_k Jun 06 '14 at 08:09
  • @ImmerAllein I will try it. And at the same time, I want to see if there are other better more advanced algorithms I want use. – Changwang Zhang Jun 06 '14 at 08:09
  • 1
    Ya know I was just looking at the LinkedList vs ArrayList http://stackoverflow.com/questions/322715/when-to-use-linkedlist-over-arraylist question not to long ago and here it is the perfect example of when to choose the right datatype. – ug_ Jun 06 '14 at 08:11
  • Also, have you tried using `removeAll`? I'd guess that that's already optimized in some way... – tobias_k Jun 06 '14 at 08:15
  • @Leo: refer the accepted answer in this similar post: http://stackoverflow.com/questions/23463634/java-data-structure-that-has-efficient-add-delete-and-random. – Shailesh Aswal Jun 06 '14 at 08:20
  • If you're doing a lot of removals then perhaps you're not using the correct data structure for the job? – Matt Coubrough Jun 06 '14 at 08:22
  • @MattCoubrough I am doing, lots of iteration, removal and some adding to the list. So I assume ArrayList is still the best option? – Changwang Zhang Jun 06 '14 at 08:25
  • iteration, removal and adding are all very performant with a linked list, do you need random access? – Matt Coubrough Jun 06 '14 at 08:26
  • @MattCoubrough Yes. the removal is kind of random access? And I need to get random elements as well. – Changwang Zhang Jun 06 '14 at 08:28
  • 1
    Only profiling will tell you what is most efficient for your use-case, But removing from a linked list only requires updating two fields per removal, whereas removing from an array list usually requires moving numerous elements. The performance differences are listed in this post here: http://stackoverflow.com/a/322742/3651800 – Matt Coubrough Jun 06 '14 at 08:32
  • 1
    @Leo Dont be hesitant to make your own data structure that fits your own needs best. In this case a LinkedList looks like the best tool for the job however the remove by index is definitely its Achilles heel. Consider making an implementation that will help optimize the index based inserts and removed. – ug_ Jun 06 '14 at 08:33
  • How many times do you delete data ? Do you get a bunch of indices to delete or, it happens sporadically ? Are there inserts happening ? Is this a one time operation ? – bsd Jun 10 '14 at 13:04

4 Answers4

2

You should take a look at GapList – a lightning-fast List implementation

From the article:


Introduction to GapList

To solve the issues brought out, we introduce GapList as another implementation of the java.util.List interface. As main features, GapList provides

  • Efficient access to elements by index
  • Constant time insertion at head and tail of list
  • Exploit the locality of reference often seen in applications

Let's see how GapList is implemented to offer these features.

If we compare how the different kind of inserts are handled by ArrayList, we can quickly come up with a solution to guarantee fast insertion both at the beginning and at the end of the list.

Instead of moving all elements to gain space at index 0, we leave the existing elements in place and write the elements at the end of the allocated array if there is space left. So we basically use the array as a kind of rotating buffer.

GapList1

For accessing the elements in the right order, we have to remember the start position of the first element and use a modulo operation to calculate the physical index from the logical one:

physIndex = (start + index) % capacity

To exploit the locality of reference, we allow a gap to be included in the storage of the list elements. The gap formed by the unused slots in the backing array can be anywhere in the list. There is at most one gap, but there can also be none.

This gap helps you to take advantage of the locality of reference to the list, so if you add an element to the middle of the list, a subsequent addition to the middle will be fast.

Middle

If a GapList has no gap, one is created if needed. If the gap is at a wrong place, it is moved. But if the operations happen near to each other, only few data will have to be copied.

GapList also allows removal of elements at the beginning and at the end without any moving of elements.

Remove

Removals in the middle are handled similar to insertions: an existing gap may be moved or vanish if no longer needed.


Here's a small sample code:

package rpax.stackoverflow.q24077045;

import java.util.*;
import java.util.concurrent.ThreadLocalRandom;
import org.magicwerk.brownies.collections.GapList;

public class Q24077045 {

    static int LIST_SIZE = 500000;

    public static void main(String[] args) {
        long a1, b1, c1 = 0, a2, b2, c2 = 0;
        int[] indexes = generateRandomIndexes(10000);

        a2 = System.currentTimeMillis();
        List<Integer> l2 = testArrayListRemove2(indexes);
        if (l2.size() < 1)
            return;
        b2 = System.currentTimeMillis();
        c2 = b2 - a2;

        a1 = System.currentTimeMillis();
        List<Integer> l = testArrayListRemove(indexes);
        if (l.size() < 1)
            return;
        b1 = System.currentTimeMillis();
        c1 = b1 - a1;

        System.out.println("1 : " + c1);
        System.out.println("2 : " + c2);

        System.out.println("Speedup : "+ c1 * 1.00 / c2+"x");

    }

    static int[] generateRandomIndexes(int number) {
        int[] indexes = new int[number];
        for (int i = 0; i < indexes.length; i++)
        {
            indexes[i] = ThreadLocalRandom.current().nextInt(0, LIST_SIZE);
        }
        Arrays.sort(indexes);
        return indexes;
    }

    public static List<Integer> testArrayListRemove(int[] indexes) {
        List<Integer> list = new ArrayList<Integer>(LIST_SIZE);

        for (int i = 0; i < LIST_SIZE; i++)
            list.add(i);

        for (int i = indexes.length - 1; i >= 0; i--)
            list.remove(indexes[i]);
        return list;
    }

    public static List<Integer> testArrayListRemove2(int[] indexes) {

        List<Integer> list = GapList.create(LIST_SIZE);

        for (int i = 0; i < LIST_SIZE; i++)
            list.add(i);

        for (int i = indexes.length - 1; i >= 0; i--)
            list.remove(indexes[i]);
        return list;
    }

}

I my laptop is about 10x faster. It seems to be a good alternative to ArrayList.

Disclaimer: This is not a performance analisis. It is only an illustrative example.

rpax
  • 4,468
  • 7
  • 33
  • 57
  • This is very interesting design. One question: how can you expand the list storage when the pre-allocation is not enough. If this need to move data around, will it cost all the benefits Gaplist have created? – Changwang Zhang Jun 11 '14 at 14:45
  • `BigList` produce `NullPointerException` and `GapList` is not efficient removing in the middle. – josejuan Jan 16 '17 at 17:12
0

You can deal with the array and iterate through it:

Integer[] arr = list.toArray(new int[]{});

int[] newArr = new int[arr.length-indices.length];

Now you'd System.arrayCopy each continguous block of the array:

for (int i=0;i<arr.length;i++) {
    for (int j : indexes) { // Should be 'indices' btw
        if (j == arr[i]) {
            // Array copy arr to newArr
            break;
        }
    }
}
Anubian Noob
  • 13,426
  • 6
  • 53
  • 75
  • 2
    Try googling ["_java 7 system.arraycopy_"](http://docs.oracle.com/javase/7/docs/api/java/lang/System.html#arraycopy) that's how you learn everything in Java ;-) – jahroy Jun 06 '14 at 08:28
0

Check out the list of datastructures here. Pick one depending on your requirements. Like Guarev mentioned, a HashMap is probably your best bet. Hashmaps have the advantage of a constant time for insert, search, and delete.

ArrayLists are not a good structure for a storing a lot of data, as the memory usage quickly goes through the roof, and search/delete times get very expensive very quickly.

Community
  • 1
  • 1
Kyte
  • 834
  • 2
  • 12
  • 27
-1

ArrayList is not really a good data structure to do this operation.

I would suggest you to use the HashMap for this purpose, you can keep the key, value pair with the key as the indexes.

Gaurav
  • 531
  • 1
  • 4
  • 15