I am writing a simple web-scraper in python3.7
using asyncio
and multiprocessing
. The general architecture looks as follows:
for i in range(self.number_processes - len(processes)):
p = Process(target=AsyncProcessWrapper().run_main_loop)
time.sleep(0.3)
p.start()
start_time = time.time()
where AsyncProcessWrapper
is defined as:
class AsyncProcessWrapper:
def __init__(self):
resource_database = Database()
# This is where the async logic takes place, following a producer-consumer pattern.
# I will hide this logic for simplicity.
self.main = Main(database=resource_database)
def run(self):
asyncio.run(self.main.run_main_loop())
The database connection is establishes once, and never closed (because this is a scraper, it takes a while until data is downloaded).
The database class init looks as follows:
class Database:
def __init__(self):
db_url = os.getenv('DatabaseUrl')
self.engine = create_engine(db_url, encoding='utf8', poolclass=NullPool)
Session = sessionmaker()
Session.configure(bind=self.engine)
self.session = Session()
def create_url_entity(self, urls):
for url in urls:
url_entity_obj = URLEntity(
url=url,
engine_version=self.engine_version
)
to_insert.append(url_entity_obj)
self.session.bulk_save_objects(to_insert)
For processes=2
(or any higher number), I don't get any errors when I run the above snippets on my MacOS which has 2 physical cores.
However, the moment that I set processes=2
on my Ubuntu 18.05 machine which has 8 physical cores, I keep getting the following error.
david@bob:~$ screen -r myproject
cursor, statement, parameters, context
File "/home/david/myproject/venv/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
cursor.execute(statement, parameters)
psycopg2.OperationalError: SSL SYSCALL error: EOF detected
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/usr/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/david/myproject/myproject/engine/core.py", line 25, in run_main_loop
asyncio.run(self.main.run_main_loop())
File "/usr/lib/python3.7/asyncio/runners.py", line 43, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.7/asyncio/base_events.py", line 579, in run_until_complete
return future.result()
File "/home/david/myproject/myproject/core/main.py", line 150, in run_main_loop
self.resource_database.create_markup_record(self.buffer_markup_records)
File "/home/david/myproject/myproject/resources/db.py", line 281, in create_markup_record
.join(RawMarkup, isouter=True) \
File "/home/david/myproject/venv/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3373, in all
return list(self)
File "/home/david/myproject/venv/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3535, in __iter__
return self._execute_and_instances(context)
File "/home/david/myproject/venv/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3560, in _execute_and_instances
result = conn.execute(querycontext.statement, self._params)
File "/home/david/myproject/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/home/david/myproject/venv/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/home/david/myproject/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1130, in _execute_clauseelement
distilled_params,
File "/home/david/myproject/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1317, in _execute_context
e, statement, parameters, cursor, context
File "/home/david/myproject/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1511, in _handle_dbapi_exception
sqlalchemy_exception, with_traceback=exc_info[2], from_=e
File "/home/david/myproject/venv/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/home/david/myproject/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
cursor, statement, parameters, context
File "/home/david/myproject/venv/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) SSL SYSCALL error: EOF detected
[SQL: SELECT url.id AS url_id, url.url AS url_url, raw_markup.id AS raw_markup_id
FROM url LEFT OUTER JOIN raw_markup ON url.id = raw_markup.url_id
WHERE url.url IN (%(url_1)s, %(url_2)s, %(url_3)s, %(url_4)s, %(url_5)s, %(url_6)s, %(url_7)s, %(url_8)s, %(url_9)s, %(url_10)s, %(url_11)s, %(url_12)s, %(url_13)s, %(url_14)s, %(url_15)s, %(url_16)s, %(url_17)s, %(url_18)s)]
[parameters: {'url_1': 'placeholder_string_1', 'url_2': 'placeholder_string_2', 'url_3': 'placeholder_string_3', ...'url_18': 'placeholder_string_18'}]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
The database on my Ubuntu is significantly larger (the ubuntu machine has 34'744'900
rows in its largest table, whereas my local machine has about 1'000'000
rows).
Additional information:
- I am prototyping my application on my mac (version 10.15). I am then uploading my code to execute on a cluster. The Postgres versions for both machines were equivalent (Postgres 10), now
- I upgraded the cluster's Postgres version to 12. This does not change anything.
- I played around with committing more often, but this does not change anything.
- The error keeps increasing when I spawn even more processes (i.e.
processes=7
). - I use
sqlalchemy==1.3
. - Increasing the niceness
nice -n 17 python -m run
does not change anything (makes the async workers run faster, it feels). - I read somewhere that it could be due to
libpq
which behaves weird on Ubuntu. Any ideas if this could cause it?
Any idea what could cause this error, and what I could try to resolve this?
EDIT:
I just started two single (no multiprocessing) async processes using bash with the following command on my MacOS:
trap 'kill %1' SIGINT
python -m myproject.core.main | tee 1.log | sed -e 's/^/[Command1] /' & python -m myproject.core.main | tee 2.log | sed -e 's/^/[Command2] /'
which now also just blocks execution completely and returns a Postgres SSL Error.
SSL error in data received
protocol: <asyncio.sslproto.SSLProtocol object at 0x115f0b828>
transport: <_SelectorSocketTransport fd=15 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/asyncio/sslproto.py", line 526, in data_received
ssldata, appdata = self._sslpipe.feed_ssldata(data)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/asyncio/sslproto.py", line 207, in feed_ssldata
self._sslobj.unwrap()
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/ssl.py", line 767, in unwrap
return self._sslobj.shutdown()
ssl.SSLError: [SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac (_ssl.c:2609)
The same bash-problem also happens on the Ubuntu system.
EDIT 2:
After configuring the logs, I get the following error logs (only last few lines). I use 7 processes in python just to make errors appear faster:
2021-01-05 12:22:56.597 UTC,"postgres","scraper",3626,"localhost.localdomain:56422",5ff45a1e.e2a,16,"idle in transaction",2021-01-05 12:22:54 UTC,6/17,0,LOG,00000,"statement: UPDATE url_queue SET crawler_processing_sentinel=true WHERE url_queue.id IN (6815557, 6813399, 6813430, 6814589, 6814591, 6813688, 6813689, 6813691, 6813717, 6813777, 6813819, 6813926, 6813929, 6814067, 6814217, 6814237, 6814257, 6814967, 6815825, 6815882, 6815887, 6815890, 6814625, 6814626, 6814629, 6814926, 6814929, 6814901, 6815026, 6815035, 6815058, 6815096, 6815008, 6815009, 6815046, 6815120, 6815493, 6815521, 6815534, 6815715, 6815719, 6815730, 6815733, 6815746, 6815772, 6815797, 6816195, 6816225, 6816234, 6816583, 6816618, 6816860, 6816896, 6817134, 6816944, 6816955, 6818076, 6818164, 6818171, 6816983, 6817018, 6817024, 6817029, 6817030, 6818130, 6817320, 6817385, 6817436, 6817481, 6817487, 6817615, 6817622, 6817711, 6817854, 6817857, 6817877, 6817881, 6817885, 6817892, 6818000, 6818006, 6818082, 6818127, 6818702, 6818818, 6818852, 6819325, 6819329, 6819338, 6819394, 6819399, 6819424, 6819156, 6819839, 6819842, 6819857, 6819861, 6819318, 6819320, 6819371, 6819376, 6819664, 6820000, 6820018, 6819774, 6819778, 6819787, 6819804, 6819818, 6819838, 6819870, 6819951, 6820299, 6820303, 6822442, 6822970, 6822988, 6822996, 6820077, 6820288, 6820316, 6820328, 6820337, 6820391, 6820395, 6820398, 6820399, 6820425, 6820432, 6820619, 6820831, 6820895, 6820958, 6820724, 6821014, 6821018, 6821027, 6821032, 6821041, 6821063, 6821151, 6821171, 6821183, 6821189, 6821207, 6821234, 6821249, 6821253, 6821263, 6821389, 6821468, 6822231, 6822292, 6821796, 6822756, 6823032, 6823105, 6823127, 6821720, 6821722, 6821746, 6821782, 6821818, 6821819, 6821821, 6821789, 6821835, 6821844, 6821862, 6821914, 6821989, 6822004, 6822006, 6822020, 6822028, 6822866, 6822909, 6822913, 6822933, 6822113, 6823419, 6823423, 6823441, 6823453, 6823469, 6823472, 6822360, 6822404, 6822413, 6822446, 6822450, 6822370, 6822398, 6824217, 6822459, 6822468, 6822567, 6822587, 6822603, 6822623, 6822635, 6822639, 6822650, 6822667, 6822669, 6822678, 6822712, 6822718, 6822720, 6822724, 6823138, 6823380, 6823383, 6823562, 6823595, 6822958, 6823026, 6823044, 6823053, 6823087, 6823237, 6823151, 6823418, 6823754, 6823772, 6823774, 6823785, 6823790, 6823326, 6823331, 6823336, 6823338, 6823341, 6823450, 6823455, 6823513, 6823545, 6823551, 6823557, 6823561, 6823573, 6823713, 6823958, 6823967, 6823971, 6823991, 6824324, 6824503, 6824593, 6825231, 6825292, 6825296, 6824336, 6824383, 6824440, 6824760, 6824767, 6824785, 6824809, 6824636, 6824680, 6824800, 6824805, 6824892, 6824894, 6824839, 6825482, 6825877, 6825539, 6825575, 6825593, 6825595, 6824961, 6824976, 6825948, 6826415, 6826433, 6826439, 6825283, 6825300, 6825301, 6826988, 6827000, 6827015, 6826745, 6826782, 6827283, 6825456, 6825461, 6825462, 6827730, 6827735, 6825669, 6825680, 6825700, 6825843, 6825997, 6826059, 6828078, 6828080, 6828085, 6825993, 6826007, 6826012, 6826015, 6826018, 6826022, 6826029, 6826035, 6826272, 6826281, 6826296, 6826298, 6826304, 6826340, 6826460, 6826646, 6826685, 6826771, 6826772, 6826853, 6826880, 6826897, 6826899, 6827037, 6827040, 6827063, 6827133, 6827142, 6827192, 6827203, 6828774, 6828785, 6827385, 6827401, 6827825, 6827569, 6827580, 6827936, 6827939, 6827946, 6827955, 6827993, 6828067, 6830231, 6830256, 6830258, 6828117, 6828179, 6828183, 6828207, 6828209, 6828211, 6829827, 6828317, 6828345, 6828353, 6828363, 6828371, 6828586, 6828588, 6828683, 6828696, 6828698, 6828717, 6828726, 6828731, 6828748, 6829184, 6829239, 6829272, 6829309, 6830332, 6828997, 6829055, 6829067, 6829639, 6829652, 6829661, 6829663, 6829252, 6829260, 6829298, 6829398, 6829400, 6829411, 6829416, 6829368, 6829372, 6829376, 6829378, 6829475, 6829527, 6829535, 6830527, 6829713, 6829715, 6829730, 6829890, 6829900, 6829934, 6830124, 6830228, 6830241, 6830245, 6830274, 6830430, 6830453, 6830450, 6830521, 6830803, 6831342, 6832991, 6835788, 6830676, 6830718, 6830719, 6830738, 6830842, 6830931, 6830883, 6830904, 6830927, 6830935, 6831322, 6831469, 6831041, 6831055, 6831071, 6831079, 6831122, 6831155, 6831205, 6831239, 6831246, 6831271, 6831677, 6831896, 6832126, 6832169, 6832176, 6832182, 6832207, 6832224, 6832238, 6832288, 6832304, 6832323, 6832687, 6832422, 6832423, 6832428, 6832437, 6832457, 6832589, 6832655, 6832720, 6832783, 6832802, 6833082, 6833108, 6832908, 6832927, 6832939, 6832955, 6833047, 6833063, 6833667, 6833170, 6833173, 6833191, 6833207, 6833220, 6833222, 6833238, 6833241, 6833353, 6833361, 6833485, 6833501, 6833544, 6833585, 6834233, 6834240, 6834266, 6834283, 6834293, 6833899, 6833902, 6833910, 6833866, 6834579, 6834625, 6834049, 6834103, 6834105, 6834114, 6834125, 6834134, 6834139, 6834149, 6834337, 6835768, 6835829, 6834530, 6834536, 6834562, 6834576, 6835135, 6835141, 6836178, 6834857)",,,,,,,,"exec_simple_query, postgres.c:1045",""
2021-01-05 12:22:56.921 UTC,"postgres","scraper",3626,"localhost.localdomain:56422",5ff45a1e.e2a,17,"idle in transaction",2021-01-05 12:22:54 UTC,6/17,77447627,LOG,00000, "statement: SELECT count(*) AS count_1
FROM (SELECT raw_markup.id AS raw_markup_id, raw_markup.url_id AS raw_markup_url_id, raw_markup.markup AS raw_markup_markup, raw_markup.spider_processing_sentinel AS raw_markup_spider_processing_sentinel, raw_markup.spider_processed_sentinel AS raw_markup_spider_processed_sentinel, raw_markup.spider_skip AS raw_markup_spider_skip, raw_markup.version_spider AS raw_markup_version_spider, raw_markup.updated_at AS raw_markup_updated_at
FROM raw_markup) AS anon_1",,,,,,,,"exec_simple_query, postgres.c:1045",""
2021-01-05 12:22:56.954 UTC,"postgres","scraper",3626,"localhost.localdomain:56422",5ff45a1e.e2a,18,"idle in transaction",2021-01-05 12:22:54 UTC,6/17,77447627,LOG,00000, "statement: SELECT count(*) AS count_1
FROM (SELECT url_queue.id AS url_queue_id, url_queue.url_id AS url_queue_url_id, url_queue.crawler_processing_sentinel AS url_queue_crawler_processing_sentinel, url_queue. crawler_processed_sentinel AS url_queue_crawler_processed_sentinel, url_queue.crawler_skip AS url_queue_crawler_skip, url_queue.retries AS url_queue_retries, url_queue. occurrences AS url_queue_occurrences, url_queue.version_crawl_frontier AS url_queue_version_crawl_frontier, url_queue.created_at AS url_queue_created_at
FROM url_queue) AS anon_1",,,,,,,,"exec_simple_query, postgres.c:1045",""
2021-01-05 12:22:57.153 UTC,"postgres","scraper",3626,"localhost.localdomain:56422",5ff45a1e.e2a,19,"idle in transaction",2021-01-05 12:22:54 UTC,6/17,77447627,LOG,00000, "statement: SELECT url.url AS url_url, url_queue.id AS url_queue_id
FROM url_queue, url LEFT OUTER JOIN raw_markup ON url.id = raw_markup.url_id
WHERE url.id = url_queue.url_id AND raw_markup.id IS NULL AND url_queue.retries < 4 AND url_queue.crawler_processing_sentinel = false AND ((url.url LIKE '%' || 'thomasnet.com' || '%') OR (url.url LIKE '%' || 'go4worldbusiness.com' || '%'))
LIMIT 512 OFFSET 31636",,,,,,,,"exec_simple_query, postgres.c:1045",""
2021-01-05 12:22:58.721 UTC,"postgres","scraper",3626,"localhost.localdomain:56422",5ff45a1e.e2a,20,"idle in transaction",2021-01-05 12:22:54 UTC,6/17,77447627,LOG,00000, "statement: UPDATE url_queue SET crawler_processing_sentinel=true WHERE url_queue.id IN (7194455, 7194460, 7192923, 7192935, 7192937, 7193023, 7193139, 7193143, 7193154, 7193155, 7193156, 7193160, 7193169, 7193193, 7193270, 7193282, 7193289, 7193312, 7193314, 7193320, 7193354, 7194866, 7194875, 7193336, 7193340, 7193357, 7193364, 7193365, 7193378, 7194600, 7194605, 7193456, 7193638, 7193702, 7193542, 7193558, 7193569, 7193582, 7193856, 7193848, 7193859, 7193877, 7194474, 7195427, 7194100, 7194101, 7194160, 7194369, 7194380, 7194382, 7193951, 7193957, 7193959, 7193970, 7193974, 7195155, 7195159, 7194564, 7194573, 7194546, 7194632, 7194686, 7194688, 7194940, 7194960, 7195101, 7195145, 7195149, 7195175, 7195197, 7195214, 7195216, 7195221, 7195277, 7195300, 7195304, 7195306, 7195329, 7195620, 7195632, 7195646, 7195891, 7195697, 7195709, 7195711, 7195713, 7195716, 7195730, 7195745, 7195747, 7195848, 7195850, 7195859, 7195871, 7196156, 7195920, 7195923, 7195955, 7195959, 7196179, 7196162, 7196165, 7196207, 7196209, 7197964, 7198011, 7196235, 7196262, 7196320, 7196335, 7196454, 7196456, 7196468, 7196469, 7196474, 7197358, 7197370, 7197392, 7197404, 7197421, 7197573, 7198707, 7198713, 7198726, 7198737, 7197631, 7197657, 7197707, 7197712, 7198182, 7198265, 7198099, 7198101, 7198245, 7198255, 7198088, 7198929, 7199166, 7198157, 7198277, 7198342, 7198394, 7198404, 7198410, 7198420, 7198488, 7198628, 7198643, 7198755, 7198779, 7198848, 7198866, 7199260, 7199329, 7198969, 7199290, 7199301, 7199332, 7199370, 7199374, 7199398, 7199413, 7199421, 7199436, 7199517, 7199521, 7199526, 7199545, 7199553, 7199604, 7199630, 7199809, 7199826, 7199855, 7200218, 7200363, 7200568, 7201533, 7201537, 7201555, 7201567, 7200617, 7200778, 7200748, 7200962, 7201990, 7201046, 7201071, 7201090, 7201333, 7201335, 7201339, 7201359, 7201362, 7201363, 7201400, 7201413, 7201516, 7204386, 7204410, 7204435, 7204462, 7204486, 7204500, 7201640, 7201652, 7201661, 7201687, 7201715, 7202354, 7202087, 7202118, 7202122, 7202272, 7202649, 7202494, 7202722, 7202782, 7202880, 7202919, 7202949, 7202969, 7202906, 7202938, 7202945, 7202951, 7203051, 7205300, 7205309, 7205311, 7203061, 7203074, 7203078, 7203215, 7203222, 7203469, 7203478, 7205776, 7205786, 7203552, 7203556, 7203566, 7203584, 7203599, 7203605, 7203613, 7203617, 7203634, 7203648, 7203668, 7203759, 7203775, 7203786, 7203898, 7203928, 7204012, 7204045, 7204053, 7204061, 7204063, 7204065, 7204067, 7204571, 7205210, 7204353, 7204369, 7204415, 7204473, 7204608, 7204610, 7204614, 7204631, 7204663, 7204672, 7204694, 7204961, 7205133, 7205145, 7205155, 7205325, 7205331, 7205353, 7205369, 7205374, 7205378, 7205386, 7205615, 7205482, 7205603, 7205612, 7205534, 7206614, 7205646, 7205659, 7205674, 7205692, 7205694, 7205703, 7205708, 7205713, 7205788, 7205796, 7205930, 7205947, 7205948, 7205949, 7205992, 7206090, 7206111, 7206117, 7207115, 7207117, 7207121, 7206418, 7206482, 7206425, 7206540, 7206762, 7206843, 7207333, 7207334, 7207336, 7207340, 7207356, 7207358, 7207376, 7206650, 7206696, 7206703, 7206721, 7206733, 7206738, 7206754, 7207211, 7207216, 7206877, 7206891, 7210076, 7207013, 7207031, 7207057, 7207127, 7207206, 7207274, 7207285, 7207286, 7207307, 7207506, 7207535, 7207732, 7207750, 7207762, 7207953, 7207962, 7208056, 7208109, 7208122, 7208186, 7208190, 7208195, 7210402, 7210406, 7210411, 7208304, 7208505, 7208621, 7208701, 7210168, 7208574, 7208596, 7208599, 7208601, 7208648, 7208898, 7208919, 7208926, 7208937, 7209076, 7209106, 7209133, 7209318, 7209329, 7209354, 7209451, 7209452, 7209463, 7209473, 7209474, 7209478, 7209483, 7209588, 7210029, 7210036, 7209796, 7210088, 7210188, 7210526, 7210527, 7210535, 7210280, 7210291, 7210292, 7210298, 7210310, 7210314, 7210330, 7210338, 7210676, 7210678, 7210680, 7210681, 7210693, 7210857, 7210874, 7210877, 7210882, 7210917, 7210928, 7212018, 7212031, 7211121, 7211129, 7211142, 7211250, 7211252, 7211271, 7212620, 7211444, 7211453, 7211538, 7211549, 7211552, 7211579, 7211823, 7211832, 7212178, 7212183, 7211884, 7211947, 7211975, 7212190, 7212208, 7212224, 7212235, 7212124, 7212222, 7212274, 7212440, 7212441, 7212443, 7212445, 7212446, 7213255, 7213967, 7214231, 7214248, 7214864, 7214878, 7212823, 7212837, 7212927, 7212948, 7212998, 7213014, 7213021, 7213026, 7212978, 7213105, 7213133, 7213303, 7213309, 7213432, 7213437, 7213507, 7214069, 7213701, 7213750, 7213773, 7213813, 7213721, 7215410, 7215442, 7215907, 7216622, 7214014, 7214367, 7214383, 7214428, 7214443, 7214181, 7214200, 7214275, 7214285, 7214300, 7214346, 7214370, 7214390, 7214554, 7214572, 7214463, 7214719, 7214731, 7214586, 7214596, 7214612, 7214629, 7214636, 7214669, 7214676, 7214687, 7216711, 7217132)",,,,,,,,"exec_simple_query, postgres.c:1045",""
2021-01-05 12:23:01.722 UTC,"postgres","scraper",3614,"localhost.localdomain:56084",5ff45a1c.e1e,21,"idle in transaction",2021-01-05 12:22:52 UTC,5/57,77447626,LOG,00000, "statement: ROLLBACK",,,,,,,,"exec_simple_query, postgres.c:1045",""
2021-01-05 12:23:05.919 UTC,"postgres","scraper",3614,"localhost.localdomain:56084",5ff45a1c.e1e,22,"idle",2021-01-05 12:22:52 UTC,5/58,0,LOG,00000,"statement: ROLLBACK",,,,,,,,"exec_simple_query, postgres.c:1045",""
2021-01-05 12:23:05.919 UTC,"postgres","scraper",3614,"localhost.localdomain:56084",5ff45a1c.e1e,23,"ROLLBACK",2021-01-05 12:22:52 UTC,5/58,0,WARNING,25P01,"there is no transaction in progress",,,,,,,,"UserAbortTransactionBlock, xact.c:3946",""
2021-01-05 12:23:05.919 UTC,"postgres","scraper",3626,"localhost.localdomain:56422",5ff45a1e.e2a,21,"idle in transaction",2021-01-05 12:22:54 UTC,6/17,77447627,LOG,00000, "statement: ROLLBACK",,,,,,,,"exec_simple_query, postgres.c:1045",""
2021-01-05 12:23:05.920 UTC,"postgres","scraper",3606,"localhost.localdomain:55748",5ff45a1a.e16,22,"idle",2021-01-05 12:22:50 UTC,,0,LOG,00000,"disconnection: session time: 0: 00:15.745 user=postgres database=scraper host=localhost.localdomain port=55748",,,,,,,,"log_disconnections, postgres.c:4677",""
2021-01-05 12:23:05.952 UTC,"postgres","scraper",3626,"localhost.localdomain:56422",5ff45a1e.e2a,22,"idle",2021-01-05 12:22:54 UTC,,0,LOG,00000,"disconnection: session time: 0: 00:10.994 user=postgres database=scraper host=localhost.localdomain port=56422",,,,,,,,"log_disconnections, postgres.c:4677",""
2021-01-05 12:23:05.952 UTC,"postgres","scraper",3614,"localhost.localdomain:56084",5ff45a1c.e1e,24,"idle",2021-01-05 12:22:52 UTC,,0,LOG,00000,"disconnection: session time: 0: 00:13.606 user=postgres database=scraper host=localhost.localdomain port=56084",,,,,,,,"log_disconnections, postgres.c:4677",""
The SQLAlchemy documentation writes It’s critical that when using a connection pool, and by extension when using an Engine created via create_engine(), that the pooled connections are not shared to a forked process. TCP connections are represented as file descriptors, which usually work across process boundaries, meaning this will cause concurrent access to the file descriptor on behalf of two or more entirely independent Python interpreter states.
, maybe the pool is shared across processes?
Edit 3: I get the same error when I use python3.8
which structures the multiprocess spawn creation differently
Edit 4: When I change the queries to multiple simple insert statements that commit very frequently (rather than bulk insert statements, and commit infrequently), this error occurs much less often.