1

Let's say I have Python process 1 on machine 1 and Python process 2 on machine 2. Both processes are the same and process data sent by a load balancer.

Both processes need to interact with a database - in my case Postgres so each process needs to know what database it should talk to, it needs to have the right models on each machine etc. It's just too tightly coupled.

The ideal would be to have a separate process dealing with the database stuff like connections, keeping up with db model changes, requests to the databases etc. What my process 1 and process 2 should do is just say I have some JSON data that needs to be saved or updated on this table or I need this data in json format.

Maybe I'm asking the impossible but is there any Python solution that would at least make life a little easier when it comes to having distributed processes interacting with relational databases in the most decoupled way possible?

avatar
  • 12,087
  • 17
  • 66
  • 82
  • Why not have each worker just access the database? After all, the point of ACID databases is to handle just that. – Marcin Apr 06 '12 at 14:47
  • @Marcin I was thinking to use some kind of ORM to access the database and that's when I realized there is some pain maintaining models and everything that comes with it for each process/machine I'm going to add. But if there is too much complication I might just have each process access the database directly as you mentioned. – avatar Apr 06 '12 at 15:18
  • I'm not sure what pain you're seeing. How is it any more complex than setting up a database connection? – Marcin Apr 06 '12 at 15:21
  • @Marcin Annoyance rather than pain would probably be more accurate to describe the issue. Let's say if I want to use the Django ORM to access the database from each machine and I have 20 of them. Each time when I do a database change I have to change the Django models on each machine. It would be nice to have this centralized so I would only need to change the models in one place rather than 20. – avatar Apr 06 '12 at 15:44
  • What do you mean "You need to change the models on each machine"? The ORM will read from the database, and as long as you do read from the database, e.g. on startup, it will be fine. I think there is a problem with your design if using an ORM is causing problems. – Marcin Apr 06 '12 at 15:48
  • @Marcin The more I'm looking into this the more I think is more an issue of using or not using an ORM in the first place. I wanted some level of abstraction regardless of the database I wanted to use that's why ORM came to my mind. But I think if I just use "SQLAlchemy Core" http://docs.sqlalchemy.org/en/latest/index.html I might get around of using and ORM and still have some level of abstraction. – avatar Apr 06 '12 at 16:06

2 Answers2

2

You could put an SQLAlchemy behind a RESTful interface.

here's a simple RESTful server in Python Recommendations of Python REST (web services) framework?

Community
  • 1
  • 1
bpgergo
  • 15,669
  • 5
  • 44
  • 68
1

If it's db connection information you're interested in, I recently wrote a service for this. Each process has token(s) set in configuration and uses those to query the service for db connection info. The data layer uses that info to create connections, no DSN's are stored. On the server side, you just maintain a dictionary of token->DSN mappings.

You could do connection pooling with bpgergo's suggestion, but you should still include an authentication or identification method. That way, if there's a network intrusion, malicious clients may not be able to impersonate one of the clients.

The service implementation is broken into a few parts:

  • A RESTful service that supports calls of the form http://192.168.1.100/getConnection?token=mytokenstring
  • A key-value storage system that stores a mapping like {'mytokenstring': {'dbname': 'db', 'ip': '192.168.1.101', 'user': 'dbuser', 'password': 'password', ..}
    • This system shouldn't be on the front end network, but if your web tier is compromised, this approach doesn't buy you any protection for the db.
  • A db object that on instantiation, retrieves a dsn using an appropriate token and creates a new db connection.
    • You should re-use this connection object for the rest of the page response if you can. The response time from the service will be fast, but there's a lot more overhead required for db connections.

Once implemented, some care is required for handing schema incompatibilities when switching the dsn info behind a token. You may be able to resolve this by pinning a token to a user session, etc.

Dana the Sane
  • 14,762
  • 8
  • 58
  • 80