0

Does anyone know of something similar to ArrayList that is better geared to handling really large amounts of data as quickly as possible?

I've got a program with a really large ArrayList that's getting choked up when it tries to explore or modify the ArrayList.

Presumably when you do:

//i is an int;
arrayList.remove(i);

The code behind the scenes runs something like:

public T remove(int i){
    //Let's say ArrayList stores it's data in a T [] array called "contents".
    T output = contents[i];
    T [] overwrite = new T [contents.length - 1];
    //Yes, I know generic arrays aren't created this simply. Bear with me here...
    for(int x=0;x<i;x++){
        overwrite[x] = contents[x];
    }
    for(int x=i+1;x<contents.length;x++){
        overwrite[x-1] = contents[x];
    }
    contents = overwrite;
    return output;
}

When the size of the ArrayList is a couple million units or so, all those cycles rearranging the positions of items in the array would take a lot of time.

I've tried to alleviate this problem by creating my own custom ArrayList subclass which segments it's data storage into smaller ArrayLists. Any process that required the ArrayList to scan it's data for a specific item generates a new search thread for each of the smaller ArrayLists within (to take advantage of my multiple CPU cores).

But this system doesn't work because when the Thread calling the search has an item in any of the ArrayLists synchronized, it can block those seperate search threads from completing their search, which in turn locks up the original thread that called the search in the process, essentially deadlocking the whole program up.

I really need some kind of data storage class oriented to containing and manipulating large amounts of objects as quickly as the PC is capable.

Any ideas?

Cambot
  • 257
  • 2
  • 13
  • Try using LinkedList<> – Dave Ranjan Apr 06 '17 at 10:21
  • How about an array? It would need a few helper functions of course. – Steve Smith Apr 06 '17 at 10:21
  • I'd use a ConcurrentLinkedHashMap – Luke Garrigan Apr 06 '17 at 10:23
  • I am inclined to close as DUP to http://stackoverflow.com/questions/559839/big-o-summary-for-java-collections-framework-implementations ... but for now: please check if that page gives you the information you need to get going. – GhostCat Apr 06 '17 at 10:29
  • In contrast to your assumption, `ArrayList.remove` implementations typically just do a System.arrayCopy from `i+1..end` to `i..end-1`. They don't allocate any extra space. (Remember that the backing array of an ArrayList is typically larger than the size of the ArrayList.) The System.arrayCopy is often native code and quite fast. Are you sure it's the `remove` method that is using up CPU time? – Klitos Kyriacou Apr 06 '17 at 11:00
  • @Dave_Ranjan, per your suggestion, I tried LinkedList. Sorry to say that it runs even slower then the ArrayList. – Cambot Apr 07 '17 at 11:30
  • @Klitos_Kyriacou Yeah, I have a monitor thread that publishes frequent updates about where all the other threads are and the bottleneck seems to be synchronized chunk of code where the only two significant lines are an ArrayList.remove() opperation and an ArrayList.add(0, i) opperation. More often then not, the remove() seems to be the choke point. – Cambot Apr 07 '17 at 11:41

3 Answers3

1

I really need some kind of data storage class oriented to containing and manipulating large amounts of objects as quickly as the PC is capable.

The answer depends a lot on what sort of data you are talking about and the specific operations you need. You use the work "explore" without defining it.

If you are talking about looking up a record then nothing beats a HashMapConcurrentHashMap for threaded operation. If you are talking about keeping in order, especially when dealing with threads, then I'd recommend a ConcurrentSkipListMap which has O(logN) lookup, insert, remove, etc..

You may also want to consider using multiple collections. You need to be careful that the collections don't get out of sync, which can be especially challenging with threads, but that might be faster depending on the various operations you are making.

When the size of the ArrayList is a couple million units or so, all those cycles rearranging the positions of items in the array would take a lot of time.

As mentioned ConcurrentSkipListMap is O(logN) for rearranging an item. i.e. remove and add with new position.

The [ArrayList.remove(i)] code behind the scenes runs something like: ...

Well not really. You can look at the code in the JDK right? ArrayList uses System.arraycopy(...) for these sorts of operations. They maybe not efficient for your case but it isn't O(N).

Gray
  • 115,027
  • 24
  • 293
  • 354
  • Order is important, so it sounds like I'd need `ConcurrantSkipListMap`. That being the case, how would I replicate the opperation `ArrayList.add(int, Object)` with this class? This is one aspect of ArrayList my program uses a fair bit, so any substitute will need to be able to perform a similar function. – Cambot Apr 07 '17 at 19:12
  • Well you can certainly use that `int` as the key to the object @Cambot. So you would `delete(int)` and then `put(int, object)` with a new object. You could also add the `int` as field to your object. If you don't control the object then you could create a wrapper object that handles the `hashCode()` and `equals()` method to do that for you. – Gray Apr 07 '17 at 21:29
0

One example of good usage for a linked list is where the list elements are very large ie. large enough that only one or two can fit in CPU cache at the same time. At this point the advantage that contiguous block containers like vectors or arrays for iteration have is more or less nullified, and a performance advantage may be possible if many insertions and removals are occurring in realtime.

ref: Under what circumstances are linked lists useful?

ref : https://coderanch.com/t/508171/java/Collection-datastructure-large-data

Community
  • 1
  • 1
Dave Ranjan
  • 2,966
  • 24
  • 55
  • In Java, there are no arrays or ArrayLists with _large_ elements. The largest possible element would be a long or a 64-bit pointer/reference. – Klitos Kyriacou Apr 06 '17 at 10:56
  • I with his concern is no of elements in the list and not the size of each element – Dave Ranjan Apr 06 '17 at 12:02
  • Well, the elements I'm working with are primarily custom objects that ultimately contain about 60 floats and maybe a dozen or so ints, give or take. Most of those numbers are actually contained within other custom objects held within. – Cambot Apr 07 '17 at 11:56
0

Different collection types has different time complexity for various operations. Typical complexities are: O(1), O(N), and O(log(N)). To choose a collection, you first need to decide which operation you use often, and avoid collections which have O(N) complexity for that operations. Here you often use operation ArrayList.remove(i) which is O(N). Even worse, you use remove(i) and not remove(element). If remove(element) would have been the only operation used often, then LinkedList could help, its remove(element) is O(1), but LinkedList.remove(i)is also O(N).

I doubt that a List with remove(i) complexity of O(1) can be implemented. The best possible time is O(log(N)), which is definitely better than O(N). Java standard library has no such implementation. You can try to google it by "binary indexed tree" keywords.

But the first thing I would do is to review the algorithm and try to get rid of List.remove(i) operation.

Alexei Kaigorodov
  • 13,189
  • 1
  • 21
  • 38
  • So would I have better performance doing `ArrayList.remove(ArrayList.get(i))`, instead of just `ArrayList.remove(i)`? – Cambot Apr 07 '17 at 11:18
  • @Cambot of course not. `ArrayList.remove(object)` is as slow as `ArrayList.remove(i)` - O(N), and in fact even slower. – Alexei Kaigorodov Apr 07 '17 at 16:22
  • Yeah, I figured that probably wouldn't make things better, because if it did, surely the Java people would've just recoded ArrayList to work that way. – Cambot Apr 07 '17 at 19:04