1

I have a task to add in treeset > 10000000 elements that are sequence.

If I use

for (long index = 0; index < 10000000; index++)
    {
        result.add(index);
    }

It takes 8083 ms. Is there any solution to increase the performance of this task?

https://github.com/cyberterror/TestRanges

P.S. The fastest way at the moment is: List<Integer> range = IntStream.range(0, 10000000).boxed().collect(Collectors.toList()); with result of ~ 370 ms

  • This seems like a weird task. Is it homework? – Kayaman May 24 '16 at 11:02
  • 2
    Do you really need a Set of all numbers from 0 to 9,999,999? Perhaps it would make sense to maintain a complement Set (all the numbers between 0 and 9,999,999 not in your original Set). The complement Set is empty, so it's much quicker to initialize :). – Eran May 24 '16 at 11:03
  • 2
    [Obligatory link](http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java). – Andy Turner May 24 '16 at 11:03
  • 1
    Eran hi. The thing is I try to use set to hold the large range of ip's. Maybe even hole range 0.0.0.0 - 255.255.255.255. Then i would like to manipulate them like adding or deleting ranges. – Игорь Рыбаков May 24 '16 at 11:05
  • You could use a `Stream` / `Iterator` to generate these data for you on the fly. – Tamas Rev May 24 '16 at 11:05
  • does it need to be in that order? if you go from 100...0 to 0 then it will be a little bit faster, because index is compared to 0 and not to a bigger number. Though I think the JVM will optimize it so after the 2nd and 3rd run it will process the loop in the same time – Arctigor May 24 '16 at 11:07
  • Is it possible to give a code snippet with stream. I was thinking about it, but do not know how to do this opperation with stream. For example add 4 bil. numbers in set ... – Игорь Рыбаков May 24 '16 at 11:08
  • *Not using a proper micro-benchmark method*, but I have found that it is a bit better if you write them into a pre-allocated `ArrayList` first, then add them all to the `TreeSet` at once. But the real performance killer here is the auto-boxing. – Andy Turner May 24 '16 at 11:08
  • Can you help with stream for this? – Игорь Рыбаков May 24 '16 at 11:12
  • according to the previous question the whole story is A-B problem and solutions don't really solve the initial problem. maybe it could be done much easier if you did not split it into parts – AdamSkywalker May 24 '16 at 11:17
  • 1
    Does it have to be a TreeSet? Could it be a BitSet? – OldCurmudgeon May 24 '16 at 11:24
  • 2
    I would probably use a data structure that represents ranges rather than individual IPs. Then you can have a set for each range start and another one for each range end, making operations with it a lot easier. Alternatively, you can try a bit field-like approach, where you have a (pretty long) byte array and each bit represents the presence/absence of that address. – biziclop May 24 '16 at 11:25
  • 1
    @biziclop - Perhaps an [IntervalTree](http://stackoverflow.com/a/25564351/823393)? – OldCurmudgeon May 24 '16 at 11:28
  • OldCurmudgeon - maybe. Have to test. biziclop - can you provide example code? – Игорь Рыбаков May 24 '16 at 11:30
  • @ИгорьРыбаков I'd go with that interval tree, it's a better idea than mine. :) – biziclop May 24 '16 at 12:08

4 Answers4

1

You already add your items in the correct order, the TreeSet will sort itself after each addded item which is complex, the LinkedHashSet simply keeps the insertion order.

So if you actually need a Set go for the LinkedHashSet implementation like here:

Set<Long> result = new LinkedHashSet<Long>();
for (Long index = 0L; index != 10000000L;) { //Avoid autoboxing
    result.add(index++);
}

Read here: https://dzone.com/articles/hashset-vs-treeset-vs

huellif
  • 414
  • 4
  • 12
1

Do you really need collection? For wich purpose, if so? Actually, using plain array you may improve performance drustically.

   long [] ar = new long[10000000];
    for (int i = 0; i < 10000000; i++) {
        ar[i] = (long )i;
    }

...

BUILD SUCCESS
------------------------------------------------------------------------
Total time: 0.553 s

UPD: Actually, it is possible to perform most operations on array using Arrays utility

long [] ar = new long[10000000];
for (int i = 0; i < 10000000; i++) {
    ar[i] = (long )i;
}

long[] copyOfRange = Arrays.copyOfRange(ar, 50000, 1000000);

...

BUILD SUCCESS
------------------------------------------------------------------------
Total time: 0.521 s
S. Kadakov
  • 861
  • 1
  • 6
  • 15
1

Try HPPC: High Performance Primitive Collections for Java

License: Apache License 2.0

<dependency>
  <groupId>com.carrotsearch</groupId>
  <artifactId>hppc</artifactId>
  <version>0.7.1</version>
</dependency>

LongHashSet executes in 1190ms:

LongSet result = new LongHashSet();
for (Long index = 0L; index < 10000000L;) {
  result.add(index++);
}

LongScatterSet executes in 850ms:

LongSet result = new LongScatterSet();
for (Long index = 0L; index < 10000000L;) {
  result.add(index++);
}
tak3shi
  • 2,305
  • 1
  • 20
  • 33
  • Good solution, but now I realize a more elegant solution, If not to save IPs as Long number but really to use some kind of self made class that can hold a range or one ip. After save it in set.... – Игорь Рыбаков May 24 '16 at 12:38
0

TreeSet is a balanced Red-Black tree. It takes so much time as the tree is being balanced every time you add a new item. Try to add the items in a different order; actually in this order:

  • 5 000 000 - middle of 0 and 10 000 000 (your set size is 10 000 000)
  • 2 500 000 - middle of 0 and 5 000 000
  • 7 500 000 - middle of 5 000 000 and 10 000 000
  • number in the middle of 0 and 2 500 000
  • number in the middle of 2 500 000 and 5 000 000
  • number in the middle of 5 000 000 and 7 500 000
  • number in the middle of 7 500 000 and 10 000 000
  • etc.

This way you will keep your tree always balanced and no additional operations will be performed (to balance the tree). Just make sure that your algorithm to count which number to add next is not too complex.

pepan
  • 678
  • 6
  • 11