1

I was just trying to compare the speed of filtering a List between Java and C++. Just for fun, because I've seen that C++ has std::vector.erase() which removes elements of a vector in place, so I was expecting it to be a lot faster than Java's equivalent, here's the code in Java:

public static void main(String[] args) {

    List<Integer> ints = new ArrayList<>(100000000);
    long t1, t2;
    int i;
    t1 = System.currentTimeMillis();
    for (i = 0; i < 100000000; i++) {
        ints.add(i);
    }
    t2 = System.currentTimeMillis();

    System.out.println("Initial array generated in " + (t2 - t1) + "ms");

    t1 = System.currentTimeMillis();
    List<Integer> result = ints.stream().filter((e) -> e % 2 == 0).collect(Collectors.toList());
    t2 = System.currentTimeMillis();


    System.out.println("Result: " + result.size() + " in " + (t2 - t1) + "ms.");
}

And the output:

Initial array generated in 21859ms
Result: 50000000 in 3135ms.

Here's the C++ equivalent:

#include <iostream>
#include <vector>
#include <algorithm>
#include <sys/time.h>

long gettime(){

   struct timeval time;
   gettimeofday(&time, NULL);

   long microsec = ((unsigned long long)time.tv_sec * 1000000) + time.tv_usec;

   return microsec / 1000;

}
int main(int argc, char **argv){

  std::vector<int> ints;

  long t1 = gettime();
  for(int i=0; i< 100000000; i++){
    ints.push_back(i);
  }
  long t2 = gettime();

  std::cout << "Initial array generated in " << (t2-t1) << "ms\n";


  t1 = gettime();
  ints.erase(std::remove_if(begin(ints), end(ints),
               [](const int i){ return i%2==0;}), end(ints));

  t2 = gettime();
  std::cout << "Result: " << ints.size() << " in " << (t2-t1) << "ms.\n";

 return 0;
}

And the C++ output:

Initial array generated in 1357ms
Result: 50000000 in 1323ms.

Ok, so the array filtering is x3 faster in C++ (I was expecting more, to be honest). The question though is, why is Java so slow at populating the List (22sec)?

Dani
  • 897
  • 6
  • 7
  • 6
    For one thing, you're boxing the all values in Java. – shmosel Nov 16 '18 at 20:55
  • An interesting insight would be to compare the time it takes to allocate a `std::vector*` on the heap to the time Java takes to instantiate and populate an `ArrayList` – Enfyve Nov 16 '18 at 21:04
  • 3
    You should preallocate the vector's storage in your C++ benchmark, using `reserve`. – Pezo Nov 16 '18 at 21:04
  • 1
    You're not warning up your java test code, so hotspot hasn't compiled it yet. See the duplicate about correct benchmarking in java. – Erwin Bolwidt Nov 16 '18 at 21:06
  • 1
    @Enfyve I guess in that case Java would win out (at least after proper warmup), because allocation itself is really fast. At least if there's no GC during filling. – Pezo Nov 16 '18 at 21:07
  • @FrançoisAndrieux For the purposes of a benchmark, I would imagine the additional overhead would skew results. But yes - in a real scenario smart pointers are a given. – Enfyve Nov 16 '18 at 21:18
  • 1
    @Enfyve `std::unique_ptr` has no overhead. Edit : On second though it might have some compared to raw pointers when moving. – François Andrieux Nov 16 '18 at 21:19
  • @Dani calling `vector::reserve(100000000)` is needed to follow the same semantics as the `ArrayList<>(100000000)` – Julien Villemure-Fréchette Nov 16 '18 at 21:47
  • @Dani You failed to tell us if you compiled your C++ code with optimizations turned on. If you didn't compile with optimizations on, all of this timing information you're showing us is meaningless for the C++ code. – PaulMcKenzie Nov 16 '18 at 22:05

1 Answers1

0

In Java ArrayList, every element is an object reference. When populating the list, your code allocates almost 100000000 new objects on the heap. Object allocation is not very fast, which explains why it takes 22 seconds.

To avoid object allocation, use arrays of primitive types (e.g. int[]). Unfortunately, Java generics cannot be parameterized with primitive types, so you cannot use ArrayList and other standard collections in this case.

abacabadabacaba
  • 2,662
  • 1
  • 13
  • 18
  • Thanks, that's what I was suspecting. As a note, using -O2 makes the vector filtering 18 times faster (from 1.3 secons to 72msec). – Dani Nov 16 '18 at 21:07
  • @Dani Note that the only c++ benchmarks that really matter are optimized builds. c++ relies on a lot of abstraction being optimized out for it's performance. – François Andrieux Nov 16 '18 at 21:13