I am converting a python script to cython and optimizing it for more speed. Right now i have 2 versions, on my desktop V2 is twice as fast as V1 unfortunately on my laptop V1 is twice as fast as V2 and i am unable to find out why there is such a big difference.
Both computers use:
- Ubuntu 16.04
- Python 2.7.12
- Cython 0.25.2
- Numpy 1.12.1
Desktop:
- Intel® Core™ i3-4370 CPU @ 3.80GHz × 4 64bit. 16GB RAM
Laptop:
- Intel® Core™ i5-3210 CPU @ 2.5GHz × 2 64bit. 8GB RAM
V1 - you can find the full code here. the only changes made are renaming go.py
, preprocessing.py
to go.pyx
, preprocessing.pyx
and using
import pyximport; pyximport.install()
to compile them. you can run test.py
. This version is using a 2d numpy array board
to store data in go.pyx
and list comprehension in the get_board
function in preprocessing.pyx
to process data. during the test no function is called from go.py
only the numpy array board
is used
V2 - you can find the full code here. quite some stuff has changed, below you can find a list with everything affecting this test case. Be aware, all function and variable declarations have to be in go.pxd
. you can run test.py
using this command: python test.py build_ext --inplace
the 2d numpy array is replaced by:
cdef char board[ 362 ]
and the function get_board_feature
in go.pyx
replaces numpy list comprehension:
cdef char get_board_feature( self, short location ):
# return correct board feature value
# 0 active player stone
# 1 opponent stone
# 2 empty location
cdef char value = self.board[ location ]
if value == EMPTY:
return 2
if value == self.player_current:
return 0
return 1
get_board
function in preprocessing.pyx
is replaced with a function that loops over the array and calls get_board_feature
in go.pyx
for every location
@cython.boundscheck(False)
@cython.wraparound(False)
cdef int get_board(self, GameState state, np.ndarray[double, ndim=2] tensor, int offSet ):
"""A feature encoding WHITE BLACK and EMPTY on separate planes, but plane 0
always refers to the current player and plane 1 to the opponent
"""
cdef short location
for location in range( 0, state.size * state.size ):
tensor[ offSet + state.get_board_feature( location ), location ] = 1
return offSet + 3
Please let me know if i should include any other information or run certain tests.
cmp, diff test
the V2 go.c
and preprocessing.c
files are identical.
V1 does not generate a .c
file to compare
update compared .so
files
the V2 go.so
files are different:
goD.so goL.so differ: byte 473, line 1
the preprocessing.so
files are identical, not sure what to think of that..