0

I have a very large dataset and I have to iterate over data. I am using very simple for loop in python as shown following. The problem is that it takes 11 seconds to finish. I tested code in Java and it takes only 4 milliseconds. There is a huge difference. So the question is how to use an efficient loop in python?

Here is the Python code and it prints "--- 11.902626514434814 seconds ---"

import time

start_time = time.time()
val = 0
for i in range(1, 17000):
    for j in range(1, 13000):
        val = 0

print("--- %s seconds ---" % (time.time() - start_time))

Here is the Java code and it prints "4":

long startTime = System.nanoTime();
int x = 0;
for (int i = 0; i < 17000; i++) {
    for (int j = 0; j < 13000; j++) {
        x = 0;
    }
}

long endTime = System.nanoTime();
System.out.println((endTime - startTime) / 1000000); 
0009laH
  • 1,960
  • 13
  • 27
Xi Jin
  • 101
  • 3
  • Most likely, the JVM detects that your loops don't actually do anything, and doesn't even try to execute them. I don't think the Python interpreter has the same kind of inbuilt smarts as the JVM, so you're actually getting 442 million variable assignments. If that takes 11 seconds, you're getting about 40 million per second, which doesn't seem too bad to me. – Dawood ibn Kareem Dec 19 '21 at 06:26
  • If the JVM was skipping the loops entirely then it wouldn't take 4 seconds. This is just a matter of the JRE being more efficient than CPython, though part of it will be due to Python creating `range` and `range_iterator` objects and the overhead of calling their methods. – kaya3 Dec 19 '21 at 06:33
  • @DawoodibnKareem I changed code as " *x = i+j;*" and it taked 8 miliseconds. – Xi Jin Dec 19 '21 at 06:35
  • It's 4 milliseconds in Java, not 4 seconds, @kaya3 – Dawood ibn Kareem Dec 19 '21 at 06:35
  • Oh, but wait, the Java one is actually measuring 4 *milliseconds*, since the divisor for nanoseconds to seconds is out by a factor of 1000. So yes, the JIT has eliminated that loop entirely. – kaya3 Dec 19 '21 at 06:36
  • Whether it's 4 milliseconds, 8 milliseconds or 4 seconds in Java, your question is still how to make the Python interpreter do an assignment, an increment and a range check in under 50 nanoseconds. I think the answer to your question is more likely to be hardware-based than software-based. – Dawood ibn Kareem Dec 19 '21 at 06:39

0 Answers0