We are building a Python microservices application with Posgresql as service datastore. At first glance Nameko seems a good starting point. However the Nameko documentation section on Concurrency includes this statement:
Nameko is built on top of the eventlet library, which provides concurrency via “greenthreads”. The concurrency model is co-routines with implicit yielding.
Implicit yielding relies on monkey patching the standard library, to trigger a yield when a thread waits on I/O. If your host services with nameko run on the command line, Nameko will apply the monkey patch for you.
Each worker executes in its own greenthread. The maximum number of concurrent workers can be tweaked based on the amount of time each worker will spend waiting on I/O.
Workers are stateless so are inherently thread safe, but dependencies should ensure they are unique per worker or otherwise safe to be accessed concurrently by multiple workers.
Note that many C-extensions that are using sockets and that would normally be considered thread-safe may not work with greenthreads. Among them are librabbitmq, MySQLdb and others.
Our architect is suggesting Nameko is therefore not going to fly - because although the pyscopg2 Postgresql driver is advertised as thread safe:
Its main features are the complete implementation of the Python DB API 2.0 specification and the thread safety (several threads can share the same connection). It was designed for heavily multi-threaded applications
The above observations are only valid for regular threads: they don’t apply to forked processes nor to green threads. libpqconnections shouldn’t be used by a forked processes, so when using a module such as multiprocessingor a forking web deploy method such as FastCGI make sure to create the connections after the fork. Connections shouldn’t be shared either by different green threads: see Support for coroutine librariesfor further details.
Warning Psycopg connections are not green thread safe and can’t be used concurrently by different green threads. Trying to execute more than one command at time using one cursor per thread will result in an error (or a deadlock on versions before 2.4.2). Therefore, programmers are advised to either avoid sharing connections between coroutines or to use a library-friendly lock to synchronize shared connections, e.g. for pooling.
The normal service configuration would have the service hold a repository with a connection shared by threads, with repository access methods using sessions on that connection scoped to the method.
Our architect suggest that even if we were to go with a connection+session per thread because of how the greenthreads work in terms of implicit yielding on a given session if we do other I/O operations between data access calls on the session e.g. file write via logging then we might suffer an implicite context switch - which then could cause issues on the session post the logging.
Is there any reasonable way we can use Nameko in this context or is it doomed as our architect suggests? Is there any way we can make this work without having to write our own microservice code e.g. using Kombu?
Additional note: A comment on this page suggests regarding Database drivers states:
You may use any database driver compatible with SQLAlchemy provided it is safe to use with eventlet. This will include all pure-python drivers.
It goes on to list pysqlite & pymysql.
Would using either pg8000 or py-postgresql pure Python drivers put us in the clear threading wise - is the issue here greenthreads in combination with pyscopg2/3 driver that uses C-code or is it fundamentally Namekos use of greenthreads?