1

While trying to delete from a table I sometimes get Lock wait timeout exceeded errors. This seems to happen when many .populate commands (with reserve_jobs=True) are running in parallel. Removing those jobs (aborting their execution) and restarting does not help immediately, but after some minutes. Any ideas what is going on there? The delete command itself should not have touched a dataset that is currently being processed.

InternalError                             Traceback (most recent call last)
<ipython-input-16-c52a52b02f79> in <module>()
----> 1 MakeDatasetsSessions.populate()

~\AppData\Local\conda\conda\envs\analysis\lib\site-packages\datajoint\autopopulate.py in populate(self, suppress_errors, reserve_jobs, order, limit, max_calls, display_progress, *restrictions)
    135                     call_count += 1
    136                     try:
--> 137                         make(dict(key))
    138                     except (KeyboardInterrupt, SystemExit, Exception) as error:
    139                         try:

c:\work\python\dj-moser-imaging\dj_schemas\preprocess.py in make(self, key)
    470                                     dataset_to_delete = (Dataset * Session.Data & 'session_name="{}"'.format(session_dict['session_name'])\
    471                                                          & 'datasettype="{}"'.format(datasettype)).fetch1('dataset_name')
--> 472                                     (Dataset & 'dataset_name = "{}"'.format(dataset_to_delete)).delete()
    473                             except dj.DataJointError:
    474                                  logger.error('Session {} for animal {}: Tried to delete datasets after mismatch was detected, but could not proceed. '.format(\

~\AppData\Local\conda\conda\envs\analysis\lib\site-packages\datajoint\base_relation.py in delete(self, verbose)
    358         try:
    359             for r in reversed(list(delete_list.values())):
--> 360                 count = r.delete_quick(get_count=True)
    361                 total += count
    362                 if (verbose or safe) and count:

~\AppData\Local\conda\conda\envs\analysis\lib\site-packages\datajoint\base_relation.py in delete_quick(self, get_count)
    296         """
    297         query = 'DELETE FROM ' + self.full_table_name + self.where_clause
--> 298         self.connection.query(query)
    299         count = self.connection.query("SELECT ROW_COUNT()").fetchone()[0] if get_count else None
    300         self._log(query[:255])

~\AppData\Local\conda\conda\envs\analysis\lib\site-packages\datajoint\connection.py in query(self, query, args, as_dict, suppress_warnings)
    131                     # suppress all warnings arising from underlying SQL library
    132                     warnings.simplefilter("ignore")
--> 133                 cur.execute(query, args)
    134 
    135         except err.OperationalError as e:

~\AppData\Local\conda\conda\envs\analysis\lib\site-packages\pymysql\cursors.py in execute(self, query, args)
    168         query = self.mogrify(query, args)
    169 
--> 170         result = self._query(query)
    171         self._executed = query
    172         return result

~\AppData\Local\conda\conda\envs\analysis\lib\site-packages\pymysql\cursors.py in _query(self, q)
    326         self._last_executed = q
    327         self._clear_result()
--> 328         conn.query(q)
    329         self._do_get_result()
    330         return self.rowcount

~\AppData\Local\conda\conda\envs\analysis\lib\site-packages\pymysql\connections.py in query(self, sql, unbuffered)
    514                 sql = sql.encode(self.encoding, 'surrogateescape')
    515         self._execute_command(COMMAND.COM_QUERY, sql)
--> 516         self._affected_rows = self._read_query_result(unbuffered=unbuffered)
    517         return self._affected_rows
    518 

~\AppData\Local\conda\conda\envs\analysis\lib\site-packages\pymysql\connections.py in _read_query_result(self, unbuffered)
    725         else:
    726             result = MySQLResult(self)
--> 727             result.read()
    728         self._result = result
    729         if result.server_status is not None:

~\AppData\Local\conda\conda\envs\analysis\lib\site-packages\pymysql\connections.py in read(self)
   1064     def read(self):
   1065         try:
-> 1066             first_packet = self.connection._read_packet()
   1067 
   1068             if first_packet.is_ok_packet():

~\AppData\Local\conda\conda\envs\analysis\lib\site-packages\pymysql\connections.py in _read_packet(self, packet_type)
    681 
    682         packet = packet_type(buff, self.encoding)
--> 683         packet.check_error()
    684         return packet
    685 

~\AppData\Local\conda\conda\envs\analysis\lib\site-packages\pymysql\protocol.py in check_error(self)
    218             errno = self.read_uint16()
    219             if DEBUG: print("errno =", errno)
--> 220             err.raise_mysql_exception(self._data)
    221 
    222     def dump(self):

~\AppData\Local\conda\conda\envs\analysis\lib\site-packages\pymysql\err.py in raise_mysql_exception(data)
    107         errval = data[3:].decode('utf-8', 'replace')
    108     errorclass = error_map.get(errno, InternalError)
--> 109     raise errorclass(errno, errval)

InternalError: (1205, 'Lock wait timeout exceeded; try restarting transaction')
Braiam
  • 1
  • 11
  • 47
  • 78
Horst
  • 167
  • 1
  • 6
  • You should show the code that is producting the error. [This question](https://stackoverflow.com/questions/5836623/getting-lock-wait-timeout-exceeded-try-restarting-transaction-even-though-im?rq=1) tells you that this is a error from the underlying mysql database, since something is locking and your current query waits too much. Also, it could be someone else running another long query. – Braiam Aug 19 '22 at 13:14

1 Answers1

2

Timeouts should rarely happen in regular DataJoint use. If they occur, we need to understand the problem thoroughly and develop recommendations to help avoid them.

From your example, it appears as though the make method of MakeDatasetsSessions attempted to delete something. Deletes from the populate call are not standard practice and we would need to understand (a) why deletes were necessary and (b) why this particular delete caused a timeout.

To understand this, would you please post the code of the make method and the ERD that include MakeDataSetsSessions and the table that was deleted from?