3

I have a more open-ended design oriented question for you here. I have background in Python but not in web nor async programming. I am writing an app to save data collect from websockets 24/24, 7/7 with the aim to minmise data loss.

My initial thoughts is to use Python 3.6 with asyncio, aiohttp and aiofiles.

I don't know whether to use one co-routine per websocket connection or one thread per websocket connection. Performance may not an issue as much as good connection error handling.

jam123
  • 129
  • 1
  • 7
  • My first thought is use a real database eg. Postgres not plain files and aiofiles. – SColvin Dec 02 '17 at 19:02
  • Since I need infrequent access to the historical files and compression, I don't see the advantage. With a database I will need to maintain the schema, migrations,... – jam123 Dec 02 '17 at 19:05
  • You'll need to maintain the schema and migrations regardless, just without all the help of a database. Are you going to implement a write staff log, and caching in memory, and will the thread management (implicit in aiofiles) be as sophisticated as with a db, and what about binary data packing, and acid transactions so incomplete writes don't corrupt your data. If you do the job properly you'll end up with much of a database, but much more work, much more to maintain and less good. – SColvin Dec 03 '17 at 10:33
  • Is the data numerical or text? – SColvin Dec 03 '17 at 10:41
  • It is mostly numeric, but also has text and date. I'm more concerned about saving the history than using (all of) it - so don't need to worry about schema up front. The schema is different across the 20 data websocket sources, so maintaining that seems costly when I could just save the raw data directly (and compress it). Incomplete writes? Indeed I am assuming I will not have corrupt data using aiofiles. I don't know whether this would be an issue, would be interested to know more about that aspect. – jam123 Dec 03 '17 at 12:04

1 Answers1

1

To answer your actual question, threads and coroutines will be equally reliable but coroutines are much easier to reason with and much of the modern existing code you'll find to read or copy will use them.

If you want to benefit from multiple cores, much better to use multiprocessing than threads to avoid the trickiness of the GIL.

SColvin
  • 11,584
  • 6
  • 57
  • 71
  • Thank you. The app would be mostly I/O (listen to 20 websockets, save a few Gigabytes of compressed data per day) so I am guessing that a single process is enough in terms of performance. I do wonder about robustness however: if I set one process per data source, then an error will only cause data loss for one source. Of course I should be able to handle errors with coroutines too (callback to restart when I detect an error), but I don't know what's best in practice. – jam123 Dec 03 '17 at 12:09