1

please help me.

I wrote a program to run "Game of Life" with PyQt6 but, it runs very very slow. How can I make it faster?

  1. What about using multiple cores of the CPU? How can a Python program run on multiple cores?
  2. What about doing display stuff with GPU? How can a Python program work with the GPU?
  3. What about writing the code itself faster? I know Python is not a fast language but, what can I do to improve the speed of code as much as possible?

main.py

from numpy import empty
from sys import argv
from cell import cell
from PyQt6.QtCore import Qt, QTimer
from PyQt6.QtGui import QBrush, QColor, QKeyEvent, QPainter, QPen, QPaintEvent
from PyQt6.QtWidgets import QApplication, QWidget

class Window(QWidget):
    def __init__(self, parent = None) -> None:
        super().__init__(parent)
        self.windowWidth = 1000
        self.windowHeight = 1000
        self.resolution = 2
        self.cells = empty((self.windowWidth // self.resolution, self.windowHeight // self.resolution), dtype=object)
        cellsShape = self.cells.shape
        for i in range(cellsShape[0]):
            for j in range(cellsShape[1]):
                self.cells[i][j] = cell(i, j, self.resolution)
        self.setWindowTitle("Game of Life")
        self.setGeometry((1920 - 800) // 2, (1080 - 800) // 2, 800, 800)
        self.setStyleSheet("background-color:rgb(20, 20, 20);")
        self.graphicTimer = QTimer(self)
        self.baseTimer = QTimer(self)
        self.graphicTimer.timeout.connect(self.update)
        self.graphicTimer.start(30)
        self.show()

    def run(self) -> None:
        for rows in self.cells:
            for cell in rows:
                cell.calculateNewState(self.cells)
        for rows in self.cells:
            for cell in rows:
                cell.setNewState()

    def keyPressEvent(self, event: QKeyEvent) -> None:
        key = event.key()
        if key == Qt.Key.Key_S:
            self.closeWindow()
        event.accept()

    def paintEvent(self, event: QPaintEvent) -> None:
        self.run()
        painter = QPainter()
        painter.begin(self)
        painter.setPen(QPen(QColor(20, 20, 20),  -1, Qt.PenStyle.SolidLine))
        painter.setBrush(QBrush(QColor(255, 255, 255), Qt.BrushStyle.SolidPattern))
        for rows in self.cells:
            for cell in rows:
                if cell.state == True:
                    painter.drawRect(cell.posX * cell.cellWidth, cell.posY * cell.cellWidth, cell.cellWidth, cell.cellWidth)
        painter.end()
        event.accept()

    def closeWindow(self) -> None:
        print("closing window ...")
        self.close()


if __name__ == "__main__":
    App = QApplication(argv)
    window = Window()
    exit(App.exec())

cell.py

from numpy.random import choice
from PyQt6.QtWidgets import QWidget


class cell(QWidget):
    def __init__(self, x, y, width, state = None, parent = None) -> None:
        super().__init__(parent)
        self.posX = x
        self.posY = y
        self.cellWidth = width
        self.state = state
        if self.state == None:
            self.state = choice([True, False])
        self.newState = None
        self.aliveNeighbors = 0
    
    def countAliveNeighbors(self, cells) -> None:
        cellsShape = cells.shape
        self.aliveNeighbors = 0
        for i in range(-1, 2):
            for j in range(-1, 2):
                neighbor = cells[(self.posX + i + cellsShape[0]) % cellsShape[0]][(self.posY + j + cellsShape[1]) % cellsShape[1]]
                if neighbor.state == True:
                    self.aliveNeighbors += 1
        if self.state == True:
            self.aliveNeighbors -= 1

    def calculateNewState(self, cells) -> None:
        self.countAliveNeighbors(cells)
        if self.state == False and self.aliveNeighbors == 3:
            self.newState = True
        elif self.state == True and (self.aliveNeighbors > 3 or self.aliveNeighbors < 2):
            self.newState = False
        else:
            self.newState = self.state
    
    def setNewState(self) -> None:
        self.state = self.newState
        self.newState = None

I want to write code that runs as fast as possible

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
DSA5252
  • 35
  • 6
  • 1
    You are creating 500x500=250000 widgets. I don't think that QT is the right library to implement your game – rochard4u Jun 09 '23 at 13:05
  • @rochard4u actually, Qt can be used for this, but it should be properly used. In fact, the OP is not even using those widgets as expected, they are just objects that happen to be QWidget subclasses. – musicamante Jun 09 '23 at 13:37
  • 2
    Your subclassing of QWidget for the cell seems completely useless, since you're never showing them. And since QWidgets are not so insignificant, using so many of them will not be a very good thing: at that point, just use basic python objects, it won't change speed, but it will certainly improve memory usage. – musicamante Jun 09 '23 at 13:40
  • @musicamante Thank you for your mention, I did copy the class "cell" from the class "window" and that is why it was QWidget. I will fix it. – DSA5252 Jun 09 '23 at 14:50

3 Answers3

2

You can do a lot better by changing the structure of the core to use a matrix of "alive" cells represented by a numpy array of 0/1. If you do this, you can chop out the double loop you have and let numpy do the heavy lifting of neighbor counting. You could do some work to index that and get summations, but the 2nd enhancement is to use the process of "convolution with a kernel" which is very common in image processing. Basically you are taking a kernel (in this case a 3x3 kernel to hit the neighbors) and multiplying that by the appropriate grid in your matrix for every member. scipy has an image processing package that makes this a snap. Note that you can set up the kernel to ignore out-of-bounds by filling it with zero (zero padding is the term.)

I modified your neighbor counting function a bit to separate it from the cell class for comparison, but didn't change any of the guts.

In a 500 x 500 structure, your approach takes about 1.3 seconds to calculate the number of living neighbors. So that looping costs roughly 1 second per frame to do. Using convolution, I can get it done in 0.004 seconds, around a 1000x speed-up.

Note that when I compared results I got a failed comparison because I don't think your modulo arithmetic is accurate for the edges (it is wrapping--not desired.) You can see it with my code if you print the first neighbor matrix.

After you "neighbor count" you can use a couple numpy functions to figure out what the alive matrix looks like for the next evolution and just incorporate that in your code.

Code:

# game of life core

import numpy as np
from scipy import ndimage
from time import time

dimensions = 500, 500  # 5, 5  # small test

alive_matrix = np.random.randint(0,2,dimensions[0]*dimensions[1]).reshape(dimensions)
neighbor_kernel = [[1, 1, 1],
                   [1, 0, 1],
                   [1, 1, 1]]

print('starting alive matrix:')
print(alive_matrix)

def countAliveNeighbors(loc, cells) -> None:
    cellsShape = cells.shape
    aliveNeighbors = 0
    # print(loc)
    for i in range(-1, 2):
        for j in range(-1, 2):
            x = (loc[0] + i + cellsShape[0]) % cellsShape[0]
            y = (loc[1] + j + cellsShape[1]) % cellsShape[1]
            # print(x,y)
            neighbor = cells[x][y]
            if neighbor == 1:
                aliveNeighbors += 1

    if cells[loc[0]][loc[1]] == 1:
        aliveNeighbors -= 1
    return aliveNeighbors

# using looping
tic = time()
living_neighbors = np.empty(dimensions, dtype=int)
for row in range(alive_matrix.shape[0]):
    for col in range(alive_matrix.shape[1]):
        living_neighbors[row][col] = countAliveNeighbors((row, col), alive_matrix)
toc = time()
print(f'using loop: {toc-tic:0.3} sec')

# using convolution with kernel
tic = time()
living_neighbors_2 = ndimage.convolve(alive_matrix, neighbor_kernel, mode='constant', cval=0)
toc = time()
print(f'using convolution: {toc-tic:0.3} sec')

# assert np.array_equal(living_neighbors, living_neighbors_2)  # <- FAILS.  I think your modulu arithmetic is bad


print('living neighbor count:')
print(living_neighbors_2)

# simplify next alive...
stays_alive = alive_matrix & np.where((living_neighbors_2 >= 2) & (living_neighbors_2 <= 3), 1, 0)
born = np.where(alive_matrix==0, 1, 0) & np.where(living_neighbors_2 == 3, 1, 0)

next_alive = stays_alive | born

print('next alive matrix:')
print(next_alive)

Output:

starting alive matrix:
[[1 1 0 ... 1 0 1]
 [1 0 1 ... 1 0 0]
 [0 0 0 ... 0 1 0]
 ...
 [0 1 0 ... 0 0 1]
 [0 1 1 ... 1 1 1]
 [0 1 0 ... 0 0 0]]
using loop: 1.27 sec
using convolution: 0.00409 sec
living neighbor count:
[[2 3 2 ... 2 3 0]
 [2 4 1 ... 4 4 2]
 [2 3 2 ... 6 2 2]
 ...
 [3 3 5 ... 6 7 4]
 [3 3 4 ... 3 3 2]
 [2 2 4 ... 3 3 2]]
next alive matrix:
[[1 1 0 ... 1 1 0]
 [1 0 0 ... 0 0 0]
 [0 1 0 ... 0 1 0]
 ...
 [1 1 0 ... 0 0 0]
 [1 1 0 ... 1 1 1]
 [0 1 0 ... 1 1 0]]
AirSquid
  • 10,214
  • 2
  • 7
  • 31
  • Circular boundary conditions aren't typical for the Game of Life, but I assumed the OP implemented it that way with `%` wrapping on purpose. (Unless you meant that it was the wrong wrapping for any plausible circular boundary condition.) Nice job showing how to sidestep the CPython interpreter overhead by recasting the problem into one that optimized C functions exist for in NumPy (a convolution). – Peter Cordes Jun 10 '23 at 00:12
  • Now I'm curious how the convolution speed compares to a plain C Game of Life, or to an AVX2 version that's another 25x faster (https://lemire.me/blog/2018/07/18/accelerating-conways-game-of-life-with-simd-instructions/). Dan Lemire reports that his scalar C version ran in 0.5 sec per iteration on a 10k x 10k grid. So that's 5 ns per cell vs. your 16.36 ns per cell with convolution, pretty close given the unknown speed ratios of the CPUs you used. – Peter Cordes Jun 10 '23 at 00:16
  • Thx. Presumably, the `scipy` convolution would be comparable to the C version as it relies on `numpy` functionality. There's probably a little fat that could be trimmed, but I would think it would be in the margins. Of course, this is all focused on the neighbor counting, there's probably a bit more on the plate for "the rest of the game" and visualization – AirSquid Jun 10 '23 at 01:02
  • Cellular automata are interesting and cool, and simulating them quickly is thus also interesting, even without doing anything to display the results. A C game-of-life implementation doesn't have to do any multiplies, just add integers, so part of that factor of 3 might be in extra instructions for those scalar integer `imul` instructions, including processing the centre element with a multiplier of 0. (I wonder if the integer convolution auto-vectorizes with AVX2 `vpmulld` for 32-bit elems? Maybe not since it still loses to plain C game-of-life, which probably doesn't auto-vectorize.) – Peter Cordes Jun 10 '23 at 01:15
1

The simplest fix you can set for yourself is to pre-calculate your cell neighbors rather than running the calculation within an O(n^2) complexity. The modulo operation can be one of the slower operations to perform, so minimizing the number of times you run it will be good.

Ultimately you want to minimize or eliminate regular O(n^2) calculations to minimize CPU strain. Given that you’re not doing anything super complicated here, I think this is a simple and quick performance gain you can realize.

Remember that Python passes values by reference, so just initialize your neighbors in __init__ rather than passing them to calculateNewState.

If you’re running into performance issues, a good helper is dis.dis to see the steps in assembly code. Find your bottlenecks as each step is something your CPU needs to manage. Then, if you can remove it, your code will have fewer steps and run faster. Sometimes without seeing the low level code, it can be hard to understand where your bottlenecks really are.

smcjones
  • 5,490
  • 1
  • 23
  • 39
  • I think I do not get your purpose for pre-calculate cell neighbors. can you explain it more or help me with changing the code? I know that Python passes values to functions by reference but, how do I put neighbors in `__init__` before making them all? I mean how to send cells as an argument to the cell class? is it even possible? and thanks for the Introduction to the "dis" module I did not know it before. – DSA5252 Jun 09 '23 at 14:46
  • I think the `%` operations are to handle circular boundary conditions? So yeah, that could optimize away if you just handle the boundaries specially, using faster code in the middle of rows / columns. I'm not seeing any O(n^2) work, though. It loops over all cells once per evolution iteration. For each cell, it checks its neighbours, which is a constant factor of 8 neighbours; calculating the next state for a cell doesn't get worse when there are more total cells. – Peter Cordes Jun 09 '23 at 16:42
  • What could be optimized is the constant factor (e.g. the `%`), and computing the new state into a new array, like update the screen from alternating buffers like double-buffering instead of running another pass to copy. And yeah there's some redundancy in the fact that each cell is the neighbour of multiple cells, so processing along a row could maybe reuse some accesses if you unroll a loop. To make it actually fast, you'd use C or C++, and use AVX2 instructions to compute 32 cells at once: https://lemire.me/blog/2018/07/18/accelerating-conways-game-of-life-with-simd-instructions/ – Peter Cordes Jun 09 '23 at 16:44
  • Each cell is calculated with each window refresh and is calculated against each neighbor. I didn’t read this enough to speak confidently about what each step is doing but the modulo operation run deep inside the loop is expensive. Each time the window is redrawn (f), a check is made for each cell (c) for each neighbor (8). So maybe not n^2 but certainly closer to polynomial than to constant time. Each frame is only O(n), but it compounds with each frame. – smcjones Jun 09 '23 at 20:44
  • Only just saw your reply since you didn't @peter notify me. The complexity class is O(8*n) work per frame, which is O(n). Constant factors aren't part of the complexity *class*, but are a big deal in practice. e.g. writing the same algorithm in C might run 100x faster than Python over a wide range of grid sizes, so it's not how it scales with N that's the problem, it's implementation details (and the overhead of a slow Python interpreter like CPython, vs. using the recent Python compiler.) – Peter Cordes Jun 09 '23 at 21:53
  • Anyway, yes quite likely some speedup might be possible in Python with simpler neighbour addressing away not near edges. CPython interpreter overhead is so bad that `%` itself might not be a big problem, e.g. in CPython 3.6.7 - [Why are bitwise operators slower than multiplication/division/modulo?](https://stackoverflow.com/q/54047100) . But definitely you want to simplify so there are fewer total operations. Perhaps doing some list operations on whole rows at once to make temporaries could help, like summing the prev and next rows so there's half the list indexing per cell. – Peter Cordes Jun 09 '23 at 21:58
  • Also, does Python allow negative array indexes to wrap, like internally doing that modulo anyway? – Peter Cordes Jun 09 '23 at 21:59
-1

You could try importing and using threading so that each task that doesn't necessarily need to be completed in order can be done simultaneously and speed things up. It could look something like this

 import threading
    x = threading.Thread(target=The_Target_Function, args=("YOUR ARGUMENTS",))
    x.start()
Tim
  • 1
  • 1
  • No, they can't, as UI elements are not thread safe. They'd need to compute the whole grid in a separate thread and then connect the result to the main one, but that wouldn't improve anything in this situation, with that amount of objects the computation would only be moved on another thread, but since Python doesn't allow concurrency, it would practically be the same, if not even worse. – musicamante Jun 09 '23 at 14:08
  • 1
    @musicamante minor nitpick: it allows concurrency, but threading doesn't allow parallelism. – Andras Deak -- Слава Україні Jun 09 '23 at 20:05
  • 1
    @AndrasDeak--СлаваУкраїні Yes, thanks, you're correct, my explanation was imprecise. – musicamante Jun 09 '23 at 23:14