0

I have a Python code below and it hits out of memory error for 100Mil loop append. Java with same code doesn't have this issue at all without any tuning.

  1. Anyway to tune using the python command or like Java Hotspot JVM Command?

  2. Anyway to tune using coding way to make it run faster and utilize lesser memory.


import datetime;

mylist = []

before = datetime.datetime.now()

for _ in range(100000000):
    mylist.append(datetime.datetime.now())

print("List length-->" , len(mylist))     
   
after = datetime.datetime.now()

print ('Python time taken in seconds--->', (after - before).seconds)

Post notes:

Memory leak detection on this "datetime.datetime.now()"

Sharing my java code here. It works more than 10 times faster without JVM tuning yet and process completed in about 6 seconds .

Anyway, Java does a much better garbage collection job than Python. Normally Java won't crash in this kinda simple operation since 20 years back. https://www.snaplogic.com/glossary/python-vs-java-performance

Note: Change from System.currentTimeMillis() to New Date() doesn't make different.

package demo;

import java.util.ArrayList;
import java.util.List;

public class Performance {

    public static void main(String[] args) {
        List<Long> mylist = new ArrayList<Long>();

        long before = System.currentTimeMillis();

        for (int i = 0; i < 100000000; i++) {
            mylist.add(System.currentTimeMillis());

        }

        long after = System.currentTimeMillis();
        System.out.println("Java time taken in miliseconds--->" + (after - before) );

    }

}
  • 1
    According to this https://stackoverflow.com/a/44508341/1531124 ... your whole machine doesnt have enough memory. But it is kinda strange that java would then work ... well, that might depend on your OS then. In Linux, some ulimit limitation might kick in preventing your python process from asking for more memory. I think in order for readers to understand what is going on, you should add A) your OS type,version B) maybe the java code you are using. – GhostCat Nov 23 '20 at 14:49
  • You could use a [generator](https://wiki.python.org/moin/Generators) – Maurice Meyer Nov 23 '20 at 14:50
  • ... could you share your Java code aswell – Maurice Meyer Nov 23 '20 at 14:53
  • Additionally, datetime.timestamp(datetime.now()) is 24 bytes against the 48 bytes of datetime.now() – ChoKaPeek Nov 23 '20 at 14:54
  • Why do you say there is a memory leak? – juanpa.arrivillaga Nov 23 '20 at 16:42
  • @juanpa.arrivillaga it's out of memory? datetime.now() object creation quite obvious drain off memory? I run this on Eclipse and i have limit to 1GB memory for both Java and Python. No issue running using command prompt, i have 32 GB laptop. Java used to be very slow also 20 years back compare to it's peer C++ and Delphi. – little star Nov 23 '20 at 17:39
  • If you have a limit of 1GB then you are just creating too many objects. In Python, *just the list object itself* would require like, 0.8 gigs with a list of length 100_000_000. Add in the memory overhead of 100_000_000 `datetime` objects would definitely put you over the 1 GB limit. – juanpa.arrivillaga Nov 23 '20 at 17:42
  • OK, so what exactly are you asking? In both the Java and Python example, garbage collection is not relevant, since in both cases, the full data structure has to be kept in memory. However, in Java, you are creating an arraylist of `long` objects (which I believe have to be boxed in Long objects). In Python, you are creating a list of `datetime` objects, as noted above, that's 48 bytes *per object*. There are more compact ways to do this in python, e.g. using a pre-allocated `array.array` and filing it with `long`'s – juanpa.arrivillaga Nov 23 '20 at 17:44
  • @juanpa.arrivillaga Both Java and Python has pro and cons. Performance probably is the Java a little winning here. Python is very efficient code, very simple and love it also – little star Nov 23 '20 at 17:49
  • @littlestar I find that hard to believe, `Long.MAX_VALUE` will be something like `9,223,372,036,854,775,807` and that would *definitely require more than a gigabyte* – juanpa.arrivillaga Nov 23 '20 at 17:49
  • @littlestar yes, it's no quesiton, CPython, at least, is much more memory intensive and slower than Java, generally speaking. – juanpa.arrivillaga Nov 23 '20 at 17:49

2 Answers2

2

Could you use a generator expression? You cannot take the length of such an expression as the values are only generated as you iterate through the expression (and thus the memory requirements are extremely low). Here is a demo:

import datetime;
import time


before = datetime.datetime.now()
mylist = (datetime.datetime.now() for _ in range(100000000))
after = datetime.datetime.now()

# the following is problematic
#print("List length-->" , len(mylist))     
   
print ('Python time taken in seconds--->', (after - before).seconds)

#get first 5 datetimes:
n = 0
for dt in mylist:
    print(dt)
    n += 1
    if n == 5:
        break

#get next 5 datetimes with sleeping:
time.sleep(1)
n = 0
for dt in mylist:
    print(dt)
    n += 1
    if n == 5:
        break
    time.sleep(1)

Prints:

Python time taken in seconds---> 0
2020-11-23 10:06:55.054316
2020-11-23 10:06:55.054316
2020-11-23 10:06:55.054316
2020-11-23 10:06:55.054316
2020-11-23 10:06:55.054316
2020-11-23 10:06:56.054372
2020-11-23 10:06:57.054869
2020-11-23 10:06:58.055935
2020-11-23 10:06:59.056067
2020-11-23 10:07:00.056201

Of course, you might as well just call datetime.datetime.now() whenever you want a new value rather than using a generator expression for this particular case. But the above shows the usefulness of generator expressions in general.

Booboo
  • 38,656
  • 3
  • 37
  • 60
1

because ints representing the unix epoch timestamp use less memory than datetime.datetime objects

>>> sys.getsizeof(datetime.datetime.now())
48
>>> sys.getsizeof(time.time())
24

you could do this:

for _ in range(100000000):
    mylist.append(time.time())
Hadrian
  • 917
  • 5
  • 10