First, I was not able to reproduce the results. Don't know which versions/set-ups are used by SPOJ. For the following experiments, PyPy 5.8.0 and CPython 2.7.12 were used.
As test case, the largest possible input file with size of about 110MB
was used:
#create_data.py
print 10**6, 33
for i in xrange(10**6):
print 10**9
>> python create_data.py > input.in
Now running /usr/bin/time -v XXX solution.py < input.py
yields:
Interpreter MaximalResidentSize
PyPy: 278 Mb
CPython: 222 Mb
PyPy needs a little bit more memory. CPython and PyPy use different garbage collector strategies and I think PyPy's trade-off is to be faster but to use more memory. The guys from PyPy have a great article about their garbage collector and its comparison to CPython.
Second, I don't trust the numbers from the SPJO-site. system.stdin.read()
will read the whole file into memory. The python documentation even says:
To read a file’s contents, call f.read(size), which reads some quantity of data and returns it as a string. size is an optional numeric argument. When size is omitted or negative, the entire contents of the file will be read and returned; it’s your problem if the file is twice as large as your machine’s memory.
Under the assumption, that the worst case was include into their test cases, the memory usage should be at least the size of the file (110 MB) as you use std.stdin.read()
and even twice the size, because you are coping the data.
Actually, I'm not sure, the whole trouble is worth it - using raw_input()
is probably fast enough - I would just trust python to do The Right Thing. CPython normally buffers stdout
and stdin
(fully buffered if they are redirected to files, or line-buffered for the console) and you have to use command line option -u
to switch it off.
But if you really wanna be sure, you can use the file-object iterators of sys.stdin
, because as CPython man pages state:
-u Force stdin, stdout and stderr to be totally unbuffered. On
systems where it matters, also put stdin, stdout and stderr in
binary mode. Note that there is internal buffering in xread‐
lines(), readlines() and file-object iterators ("for line in
sys.stdin") which is not influenced by this option. To work
around this, you will want to use "sys.stdin.readline()" inside
a "while 1:" loop.
That means your program could look like this:
import sys
num, k = map(int,raw_input().split())
ans = 0
for line in sys.stdin:
if int(line)%k == 0:
ans += 1
print(ans)
This has the big advantage that only around 7MB memory are used for this variant.
Another lessons is that you should not use sys.stdin.readline()
if your are afraid, that somebody runs your program in the unbuffered mode.
some further experiments (with my cpu clocked down)
CPython CPython -u PyPy PyPy -u
original 28sec/221MB 25sec/221MB 3sec/278MB 3sec/278MB
raw_input() 29sec/7MB 110sec/7MB 7sec/75MB 100sec/63MB
readline() 38sec/7MB 130sec/7MB 5sec/75MB 100sec/63MB
readlines() 20sec/560MB 20sec/560MB 4sec/1.4GB 4sec/1.4G
file-iterator 17sec/7MB 17sec/7MB 4sec/68MB 100sec/62MB
There are some takeaways:
raw_input()
and sys.stdin.read_line()
have identical performances
raw_input()
is buffered, but this buffer seems to be a little bit different as the buffer for file-object iterator, which outperforms raw_input()
at least for this file.
- memory-overhead of
sys.stdin.readlines()
seems to be pretty hight, at least as long as the lines are short.
- file-object iterator has different behavior in CPython and PyPy, if option
-u
is used: for PyPy -u
switches off also the buffering for file-object iterator (maybe a bug?).