1

I am writing a simple web-scraper in python3.7 using asyncio and multiprocessing. The general architecture looks as follows:

for i in range(self.number_processes - len(processes)):
    p = Process(target=AsyncProcessWrapper().run_main_loop)
    time.sleep(0.3)
    p.start()
    start_time = time.time()

where AsyncProcessWrapper is defined as:

class AsyncProcessWrapper:

    def __init__(self):
        resource_database = Database()
        # This is where the async logic takes place, following a producer-consumer pattern.
        # I will hide this logic for simplicity.
        self.main = Main(database=resource_database)

    def run(self):
        asyncio.run(self.main.run_main_loop())

The database connection is establishes once, and never closed (because this is a scraper, it takes a while until data is downloaded).

The database class init looks as follows:

class Database:

    def __init__(self):
        db_url = os.getenv('DatabaseUrl')
        self.engine = create_engine(db_url, encoding='utf8', poolclass=NullPool)

        Session = sessionmaker()
        Session.configure(bind=self.engine)
        self.session = Session()

    def create_url_entity(self, urls):

        for url in urls:
            url_entity_obj = URLEntity(
                url=url,
                engine_version=self.engine_version
            )
            to_insert.append(url_entity_obj)

        self.session.bulk_save_objects(to_insert)

For processes=2 (or any higher number), I don't get any errors when I run the above snippets on my MacOS which has 2 physical cores. However, the moment that I set processes=2 on my Ubuntu 18.05 machine which has 8 physical cores, I keep getting the following error.

david@bob:~$ screen -r myproject

    cursor, statement, parameters, context
  File "/home/david/myproject/venv/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
    cursor.execute(statement, parameters)
psycopg2.OperationalError: SSL SYSCALL error: EOF detected


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/david/myproject/myproject/engine/core.py", line 25, in run_main_loop
    asyncio.run(self.main.run_main_loop())
  File "/usr/lib/python3.7/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.7/asyncio/base_events.py", line 579, in run_until_complete
    return future.result()
  File "/home/david/myproject/myproject/core/main.py", line 150, in run_main_loop
    self.resource_database.create_markup_record(self.buffer_markup_records)
  File "/home/david/myproject/myproject/resources/db.py", line 281, in create_markup_record
    .join(RawMarkup, isouter=True) \
  File "/home/david/myproject/venv/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3373, in all
    return list(self)
  File "/home/david/myproject/venv/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3535, in __iter__
    return self._execute_and_instances(context)
  File "/home/david/myproject/venv/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3560, in _execute_and_instances
    result = conn.execute(querycontext.statement, self._params)
  File "/home/david/myproject/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
    return meth(self, multiparams, params)
  File "/home/david/myproject/venv/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/home/david/myproject/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1130, in _execute_clauseelement
    distilled_params,
  File "/home/david/myproject/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1317, in _execute_context
    e, statement, parameters, cursor, context
  File "/home/david/myproject/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1511, in _handle_dbapi_exception
    sqlalchemy_exception, with_traceback=exc_info[2], from_=e
  File "/home/david/myproject/venv/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/home/david/myproject/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
    cursor, statement, parameters, context
  File "/home/david/myproject/venv/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) SSL SYSCALL error: EOF detected

[SQL: SELECT url.id AS url_id, url.url AS url_url, raw_markup.id AS raw_markup_id 
FROM url LEFT OUTER JOIN raw_markup ON url.id = raw_markup.url_id 
WHERE url.url IN (%(url_1)s, %(url_2)s, %(url_3)s, %(url_4)s, %(url_5)s, %(url_6)s, %(url_7)s, %(url_8)s, %(url_9)s, %(url_10)s, %(url_11)s, %(url_12)s, %(url_13)s, %(url_14)s, %(url_15)s, %(url_16)s, %(url_17)s, %(url_18)s)]
[parameters: {'url_1': 'placeholder_string_1', 'url_2': 'placeholder_string_2', 'url_3': 'placeholder_string_3', ...'url_18': 'placeholder_string_18'}]
(Background on this error at: http://sqlalche.me/e/13/e3q8)

The database on my Ubuntu is significantly larger (the ubuntu machine has 34'744'900 rows in its largest table, whereas my local machine has about 1'000'000 rows).

Additional information:

  • I am prototyping my application on my mac (version 10.15). I am then uploading my code to execute on a cluster. The Postgres versions for both machines were equivalent (Postgres 10), now
  • I upgraded the cluster's Postgres version to 12. This does not change anything.
  • I played around with committing more often, but this does not change anything.
  • The error keeps increasing when I spawn even more processes (i.e. processes=7).
  • I use sqlalchemy==1.3.
  • Increasing the niceness nice -n 17 python -m run does not change anything (makes the async workers run faster, it feels).
  • I read somewhere that it could be due to libpq which behaves weird on Ubuntu. Any ideas if this could cause it?

Any idea what could cause this error, and what I could try to resolve this?


EDIT:

I just started two single (no multiprocessing) async processes using bash with the following command on my MacOS:

trap 'kill %1' SIGINT
python -m myproject.core.main | tee 1.log | sed -e 's/^/[Command1] /' & python -m myproject.core.main | tee 2.log | sed -e 's/^/[Command2] /'

which now also just blocks execution completely and returns a Postgres SSL Error.

SSL error in data received
protocol: <asyncio.sslproto.SSLProtocol object at 0x115f0b828>
transport: <_SelectorSocketTransport fd=15 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/asyncio/sslproto.py", line 526, in data_received
    ssldata, appdata = self._sslpipe.feed_ssldata(data)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/asyncio/sslproto.py", line 207, in feed_ssldata
    self._sslobj.unwrap()
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/ssl.py", line 767, in unwrap
    return self._sslobj.shutdown()
ssl.SSLError: [SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac (_ssl.c:2609)

The same bash-problem also happens on the Ubuntu system.


EDIT 2:

After configuring the logs, I get the following error logs (only last few lines). I use 7 processes in python just to make errors appear faster:

 2021-01-05 12:22:56.597 UTC,"postgres","scraper",3626,"localhost.localdomain:56422",5ff45a1e.e2a,16,"idle in transaction",2021-01-05 12:22:54 UTC,6/17,0,LOG,00000,"statement:   UPDATE url_queue SET crawler_processing_sentinel=true WHERE url_queue.id IN (6815557, 6813399, 6813430, 6814589, 6814591, 6813688, 6813689, 6813691, 6813717, 6813777, 6813819,  6813926, 6813929, 6814067, 6814217, 6814237, 6814257, 6814967, 6815825, 6815882, 6815887, 6815890, 6814625, 6814626, 6814629, 6814926, 6814929, 6814901, 6815026, 6815035,       6815058, 6815096, 6815008, 6815009, 6815046, 6815120, 6815493, 6815521, 6815534, 6815715, 6815719, 6815730, 6815733, 6815746, 6815772, 6815797, 6816195, 6816225, 6816234,       6816583, 6816618, 6816860, 6816896, 6817134, 6816944, 6816955, 6818076, 6818164, 6818171, 6816983, 6817018, 6817024, 6817029, 6817030, 6818130, 6817320, 6817385, 6817436,       6817481, 6817487, 6817615, 6817622, 6817711, 6817854, 6817857, 6817877, 6817881, 6817885, 6817892, 6818000, 6818006, 6818082, 6818127, 6818702, 6818818, 6818852, 6819325,       6819329, 6819338, 6819394, 6819399, 6819424, 6819156, 6819839, 6819842, 6819857, 6819861, 6819318, 6819320, 6819371, 6819376, 6819664, 6820000, 6820018, 6819774, 6819778,       6819787, 6819804, 6819818, 6819838, 6819870, 6819951, 6820299, 6820303, 6822442, 6822970, 6822988, 6822996, 6820077, 6820288, 6820316, 6820328, 6820337, 6820391, 6820395,       6820398, 6820399, 6820425, 6820432, 6820619, 6820831, 6820895, 6820958, 6820724, 6821014, 6821018, 6821027, 6821032, 6821041, 6821063, 6821151, 6821171, 6821183, 6821189,       6821207, 6821234, 6821249, 6821253, 6821263, 6821389, 6821468, 6822231, 6822292, 6821796, 6822756, 6823032, 6823105, 6823127, 6821720, 6821722, 6821746, 6821782, 6821818,       6821819, 6821821, 6821789, 6821835, 6821844, 6821862, 6821914, 6821989, 6822004, 6822006, 6822020, 6822028, 6822866, 6822909, 6822913, 6822933, 6822113, 6823419, 6823423,       6823441, 6823453, 6823469, 6823472, 6822360, 6822404, 6822413, 6822446, 6822450, 6822370, 6822398, 6824217, 6822459, 6822468, 6822567, 6822587, 6822603, 6822623, 6822635,       6822639, 6822650, 6822667, 6822669, 6822678, 6822712, 6822718, 6822720, 6822724, 6823138, 6823380, 6823383, 6823562, 6823595, 6822958, 6823026, 6823044, 6823053, 6823087,       6823237, 6823151, 6823418, 6823754, 6823772, 6823774, 6823785, 6823790, 6823326, 6823331, 6823336, 6823338, 6823341, 6823450, 6823455, 6823513, 6823545, 6823551, 6823557,       6823561, 6823573, 6823713, 6823958, 6823967, 6823971, 6823991, 6824324, 6824503, 6824593, 6825231, 6825292, 6825296, 6824336, 6824383, 6824440, 6824760, 6824767, 6824785,       6824809, 6824636, 6824680, 6824800, 6824805, 6824892, 6824894, 6824839, 6825482, 6825877, 6825539, 6825575, 6825593, 6825595, 6824961, 6824976, 6825948, 6826415, 6826433,       6826439, 6825283, 6825300, 6825301, 6826988, 6827000, 6827015, 6826745, 6826782, 6827283, 6825456, 6825461, 6825462, 6827730, 6827735, 6825669, 6825680, 6825700, 6825843,       6825997, 6826059, 6828078, 6828080, 6828085, 6825993, 6826007, 6826012, 6826015, 6826018, 6826022, 6826029, 6826035, 6826272, 6826281, 6826296, 6826298, 6826304, 6826340,       6826460, 6826646, 6826685, 6826771, 6826772, 6826853, 6826880, 6826897, 6826899, 6827037, 6827040, 6827063, 6827133, 6827142, 6827192, 6827203, 6828774, 6828785, 6827385,       6827401, 6827825, 6827569, 6827580, 6827936, 6827939, 6827946, 6827955, 6827993, 6828067, 6830231, 6830256, 6830258, 6828117, 6828179, 6828183, 6828207, 6828209, 6828211,       6829827, 6828317, 6828345, 6828353, 6828363, 6828371, 6828586, 6828588, 6828683, 6828696, 6828698, 6828717, 6828726, 6828731, 6828748, 6829184, 6829239, 6829272, 6829309,       6830332, 6828997, 6829055, 6829067, 6829639, 6829652, 6829661, 6829663, 6829252, 6829260, 6829298, 6829398, 6829400, 6829411, 6829416, 6829368, 6829372, 6829376, 6829378,       6829475, 6829527, 6829535, 6830527, 6829713, 6829715, 6829730, 6829890, 6829900, 6829934, 6830124, 6830228, 6830241, 6830245, 6830274, 6830430, 6830453, 6830450, 6830521,       6830803, 6831342, 6832991, 6835788, 6830676, 6830718, 6830719, 6830738, 6830842, 6830931, 6830883, 6830904, 6830927, 6830935, 6831322, 6831469, 6831041, 6831055, 6831071,       6831079, 6831122, 6831155, 6831205, 6831239, 6831246, 6831271, 6831677, 6831896, 6832126, 6832169, 6832176, 6832182, 6832207, 6832224, 6832238, 6832288, 6832304, 6832323,       6832687, 6832422, 6832423, 6832428, 6832437, 6832457, 6832589, 6832655, 6832720, 6832783, 6832802, 6833082, 6833108, 6832908, 6832927, 6832939, 6832955, 6833047, 6833063,       6833667, 6833170, 6833173, 6833191, 6833207, 6833220, 6833222, 6833238, 6833241, 6833353, 6833361, 6833485, 6833501, 6833544, 6833585, 6834233, 6834240, 6834266, 6834283,       6834293, 6833899, 6833902, 6833910, 6833866, 6834579, 6834625, 6834049, 6834103, 6834105, 6834114, 6834125, 6834134, 6834139, 6834149, 6834337, 6835768, 6835829, 6834530,       6834536, 6834562, 6834576, 6835135, 6835141, 6836178, 6834857)",,,,,,,,"exec_simple_query, postgres.c:1045",""
 2021-01-05 12:22:56.921 UTC,"postgres","scraper",3626,"localhost.localdomain:56422",5ff45a1e.e2a,17,"idle in transaction",2021-01-05 12:22:54 UTC,6/17,77447627,LOG,00000,       "statement: SELECT count(*) AS count_1
 FROM (SELECT raw_markup.id AS raw_markup_id, raw_markup.url_id AS raw_markup_url_id, raw_markup.markup AS raw_markup_markup, raw_markup.spider_processing_sentinel AS            raw_markup_spider_processing_sentinel, raw_markup.spider_processed_sentinel AS raw_markup_spider_processed_sentinel, raw_markup.spider_skip AS raw_markup_spider_skip,           raw_markup.version_spider AS raw_markup_version_spider, raw_markup.updated_at AS raw_markup_updated_at
 FROM raw_markup) AS anon_1",,,,,,,,"exec_simple_query, postgres.c:1045",""
 2021-01-05 12:22:56.954 UTC,"postgres","scraper",3626,"localhost.localdomain:56422",5ff45a1e.e2a,18,"idle in transaction",2021-01-05 12:22:54 UTC,6/17,77447627,LOG,00000,       "statement: SELECT count(*) AS count_1
 FROM (SELECT url_queue.id AS url_queue_id, url_queue.url_id AS url_queue_url_id, url_queue.crawler_processing_sentinel AS url_queue_crawler_processing_sentinel, url_queue.      crawler_processed_sentinel AS url_queue_crawler_processed_sentinel, url_queue.crawler_skip AS url_queue_crawler_skip, url_queue.retries AS url_queue_retries, url_queue.         occurrences AS url_queue_occurrences, url_queue.version_crawl_frontier AS url_queue_version_crawl_frontier, url_queue.created_at AS url_queue_created_at
 FROM url_queue) AS anon_1",,,,,,,,"exec_simple_query, postgres.c:1045",""
 2021-01-05 12:22:57.153 UTC,"postgres","scraper",3626,"localhost.localdomain:56422",5ff45a1e.e2a,19,"idle in transaction",2021-01-05 12:22:54 UTC,6/17,77447627,LOG,00000,       "statement: SELECT url.url AS url_url, url_queue.id AS url_queue_id
 FROM url_queue, url LEFT OUTER JOIN raw_markup ON url.id = raw_markup.url_id
 WHERE url.id = url_queue.url_id AND raw_markup.id IS NULL AND url_queue.retries < 4 AND url_queue.crawler_processing_sentinel = false AND ((url.url LIKE '%' || 'thomasnet.com'  || '%') OR (url.url LIKE '%' || 'go4worldbusiness.com' || '%'))
  LIMIT 512 OFFSET 31636",,,,,,,,"exec_simple_query, postgres.c:1045",""
 2021-01-05 12:22:58.721 UTC,"postgres","scraper",3626,"localhost.localdomain:56422",5ff45a1e.e2a,20,"idle in transaction",2021-01-05 12:22:54 UTC,6/17,77447627,LOG,00000,       "statement: UPDATE url_queue SET crawler_processing_sentinel=true WHERE url_queue.id IN (7194455, 7194460, 7192923, 7192935, 7192937, 7193023, 7193139, 7193143, 7193154,        7193155, 7193156, 7193160, 7193169, 7193193, 7193270, 7193282, 7193289, 7193312, 7193314, 7193320, 7193354, 7194866, 7194875, 7193336, 7193340, 7193357, 7193364, 7193365,       7193378, 7194600, 7194605, 7193456, 7193638, 7193702, 7193542, 7193558, 7193569, 7193582, 7193856, 7193848, 7193859, 7193877, 7194474, 7195427, 7194100, 7194101, 7194160,       7194369, 7194380, 7194382, 7193951, 7193957, 7193959, 7193970, 7193974, 7195155, 7195159, 7194564, 7194573, 7194546, 7194632, 7194686, 7194688, 7194940, 7194960, 7195101,       7195145, 7195149, 7195175, 7195197, 7195214, 7195216, 7195221, 7195277, 7195300, 7195304, 7195306, 7195329, 7195620, 7195632, 7195646, 7195891, 7195697, 7195709, 7195711,       7195713, 7195716, 7195730, 7195745, 7195747, 7195848, 7195850, 7195859, 7195871, 7196156, 7195920, 7195923, 7195955, 7195959, 7196179, 7196162, 7196165, 7196207, 7196209,       7197964, 7198011, 7196235, 7196262, 7196320, 7196335, 7196454, 7196456, 7196468, 7196469, 7196474, 7197358, 7197370, 7197392, 7197404, 7197421, 7197573, 7198707, 7198713,       7198726, 7198737, 7197631, 7197657, 7197707, 7197712, 7198182, 7198265, 7198099, 7198101, 7198245, 7198255, 7198088, 7198929, 7199166, 7198157, 7198277, 7198342, 7198394,       7198404, 7198410, 7198420, 7198488, 7198628, 7198643, 7198755, 7198779, 7198848, 7198866, 7199260, 7199329, 7198969, 7199290, 7199301, 7199332, 7199370, 7199374, 7199398,       7199413, 7199421, 7199436, 7199517, 7199521, 7199526, 7199545, 7199553, 7199604, 7199630, 7199809, 7199826, 7199855, 7200218, 7200363, 7200568, 7201533, 7201537, 7201555,       7201567, 7200617, 7200778, 7200748, 7200962, 7201990, 7201046, 7201071, 7201090, 7201333, 7201335, 7201339, 7201359, 7201362, 7201363, 7201400, 7201413, 7201516, 7204386,       7204410, 7204435, 7204462, 7204486, 7204500, 7201640, 7201652, 7201661, 7201687, 7201715, 7202354, 7202087, 7202118, 7202122, 7202272, 7202649, 7202494, 7202722, 7202782,       7202880, 7202919, 7202949, 7202969, 7202906, 7202938, 7202945, 7202951, 7203051, 7205300, 7205309, 7205311, 7203061, 7203074, 7203078, 7203215, 7203222, 7203469, 7203478,       7205776, 7205786, 7203552, 7203556, 7203566, 7203584, 7203599, 7203605, 7203613, 7203617, 7203634, 7203648, 7203668, 7203759, 7203775, 7203786, 7203898, 7203928, 7204012,       7204045, 7204053, 7204061, 7204063, 7204065, 7204067, 7204571, 7205210, 7204353, 7204369, 7204415, 7204473, 7204608, 7204610, 7204614, 7204631, 7204663, 7204672, 7204694,       7204961, 7205133, 7205145, 7205155, 7205325, 7205331, 7205353, 7205369, 7205374, 7205378, 7205386, 7205615, 7205482, 7205603, 7205612, 7205534, 7206614, 7205646, 7205659,       7205674, 7205692, 7205694, 7205703, 7205708, 7205713, 7205788, 7205796, 7205930, 7205947, 7205948, 7205949, 7205992, 7206090, 7206111, 7206117, 7207115, 7207117, 7207121,       7206418, 7206482, 7206425, 7206540, 7206762, 7206843, 7207333, 7207334, 7207336, 7207340, 7207356, 7207358, 7207376, 7206650, 7206696, 7206703, 7206721, 7206733, 7206738,       7206754, 7207211, 7207216, 7206877, 7206891, 7210076, 7207013, 7207031, 7207057, 7207127, 7207206, 7207274, 7207285, 7207286, 7207307, 7207506, 7207535, 7207732, 7207750, 7207762, 7207953, 7207962, 7208056, 7208109, 7208122, 7208186, 7208190, 7208195, 7210402, 7210406, 7210411, 7208304, 7208505, 7208621, 7208701, 7210168, 7208574, 7208596,       7208599, 7208601, 7208648, 7208898, 7208919, 7208926, 7208937, 7209076, 7209106, 7209133, 7209318, 7209329, 7209354, 7209451, 7209452, 7209463, 7209473, 7209474, 7209478, 7209483, 7209588, 7210029, 7210036, 7209796, 7210088, 7210188, 7210526, 7210527, 7210535, 7210280, 7210291, 7210292, 7210298, 7210310, 7210314, 7210330, 7210338, 7210676,       7210678, 7210680, 7210681, 7210693, 7210857, 7210874, 7210877, 7210882, 7210917, 7210928, 7212018, 7212031, 7211121, 7211129, 7211142, 7211250, 7211252, 7211271, 7212620,       7211444, 7211453, 7211538, 7211549, 7211552, 7211579, 7211823, 7211832, 7212178, 7212183, 7211884, 7211947, 7211975, 7212190, 7212208, 7212224, 7212235, 7212124, 7212222,       7212274, 7212440, 7212441, 7212443, 7212445, 7212446, 7213255, 7213967, 7214231, 7214248, 7214864, 7214878, 7212823, 7212837, 7212927, 7212948, 7212998, 7213014, 7213021,       7213026, 7212978, 7213105, 7213133, 7213303, 7213309, 7213432, 7213437, 7213507, 7214069, 7213701, 7213750, 7213773, 7213813, 7213721, 7215410, 7215442, 7215907, 7216622,       7214014, 7214367, 7214383, 7214428, 7214443, 7214181, 7214200, 7214275, 7214285, 7214300, 7214346, 7214370, 7214390, 7214554, 7214572, 7214463, 7214719, 7214731, 7214586, 7214596, 7214612, 7214629, 7214636, 7214669, 7214676, 7214687, 7216711, 7217132)",,,,,,,,"exec_simple_query, postgres.c:1045",""
 2021-01-05 12:23:01.722 UTC,"postgres","scraper",3614,"localhost.localdomain:56084",5ff45a1c.e1e,21,"idle in transaction",2021-01-05 12:22:52 UTC,5/57,77447626,LOG,00000,       "statement: ROLLBACK",,,,,,,,"exec_simple_query, postgres.c:1045",""
 2021-01-05 12:23:05.919 UTC,"postgres","scraper",3614,"localhost.localdomain:56084",5ff45a1c.e1e,22,"idle",2021-01-05 12:22:52 UTC,5/58,0,LOG,00000,"statement:                  ROLLBACK",,,,,,,,"exec_simple_query, postgres.c:1045",""
 2021-01-05 12:23:05.919 UTC,"postgres","scraper",3614,"localhost.localdomain:56084",5ff45a1c.e1e,23,"ROLLBACK",2021-01-05 12:22:52 UTC,5/58,0,WARNING,25P01,"there is no         transaction in progress",,,,,,,,"UserAbortTransactionBlock, xact.c:3946",""
 2021-01-05 12:23:05.919 UTC,"postgres","scraper",3626,"localhost.localdomain:56422",5ff45a1e.e2a,21,"idle in transaction",2021-01-05 12:22:54 UTC,6/17,77447627,LOG,00000,       "statement: ROLLBACK",,,,,,,,"exec_simple_query, postgres.c:1045",""
 2021-01-05 12:23:05.920 UTC,"postgres","scraper",3606,"localhost.localdomain:55748",5ff45a1a.e16,22,"idle",2021-01-05 12:22:50 UTC,,0,LOG,00000,"disconnection: session time: 0: 00:15.745 user=postgres database=scraper host=localhost.localdomain port=55748",,,,,,,,"log_disconnections, postgres.c:4677",""
 2021-01-05 12:23:05.952 UTC,"postgres","scraper",3626,"localhost.localdomain:56422",5ff45a1e.e2a,22,"idle",2021-01-05 12:22:54 UTC,,0,LOG,00000,"disconnection: session time: 0: 00:10.994 user=postgres database=scraper host=localhost.localdomain port=56422",,,,,,,,"log_disconnections, postgres.c:4677",""
 2021-01-05 12:23:05.952 UTC,"postgres","scraper",3614,"localhost.localdomain:56084",5ff45a1c.e1e,24,"idle",2021-01-05 12:22:52 UTC,,0,LOG,00000,"disconnection: session time: 0: 00:13.606 user=postgres database=scraper host=localhost.localdomain port=56084",,,,,,,,"log_disconnections, postgres.c:4677",""

The SQLAlchemy documentation writes It’s critical that when using a connection pool, and by extension when using an Engine created via create_engine(), that the pooled connections are not shared to a forked process. TCP connections are represented as file descriptors, which usually work across process boundaries, meaning this will cause concurrent access to the file descriptor on behalf of two or more entirely independent Python interpreter states., maybe the pool is shared across processes?


Edit 3: I get the same error when I use python3.8 which structures the multiprocess spawn creation differently


Edit 4: When I change the queries to multiple simple insert statements that commit very frequently (rather than bulk insert statements, and commit infrequently), this error occurs much less often.

DaveTheAl
  • 1,995
  • 4
  • 35
  • 65
  • 1
    Can you check if you see something meaningful in the postgresql log file? – Michal T Jan 05 '21 at 07:03
  • @MichalT I am trying to install the logs... Also, when I deactive `SSL=off` in the conf file, it gets better, but now I get `sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request.` errors. Will post the logs asap – DaveTheAl Jan 05 '21 at 12:18
  • @MichalT I added the postgres error logs now :) – DaveTheAl Jan 05 '21 at 12:28
  • I think this issue seems related, but not sure https://stackoverflow.com/questions/41279157/connection-problems-with-sqlalchemy-and-multiple-processes – DaveTheAl Jan 05 '21 at 12:31

1 Answers1

1

Instead of

class AsyncProcessWrapper:

    def __init__(self):
        resource_database = Database()
        self.main = Main(database=resource_database)

    def run(self):
        asyncio.run(self.main.run_main_loop())

you should do:

class AsyncProcessWrapper:

    def __init__(self):
        self.name = 'PROC:' + ''.join(random.choice(string.ascii_uppercase) for _ in range(4))

    def run_main_loop(self):

        self.resource_database = Database() 
        self.resource_database.engine.dispose()
        self.main = Main(name=self.name, database=self.resource_database)
        asyncio.run(self.main.run_main_loop())

as the connection pool otherwise gets spawned in the parent process, which you want to avoid according to this (__init__ is not called by the child process, but still in the parent loop)

DaveTheAl
  • 1,995
  • 4
  • 35
  • 65