1

This question is about possible future improvements to a task I'm almost done with.

I have loaded a MySQL database with a subset of the Universal Medical Language System's Metathesaurus. I used a Java application called MetaMorphoSys, which generated a Bash wrapper, one SQL script for defining tables and importing data from text files, and another for indexing.

Loading and indexing a small UMLS subset (3.3 M rows in table MRSAT) goes to completion without errors. Loading a larger subset (39.4 M rows in MRSAT) is also successful, but then the indexing fails at this step after 1500 to 1800 seconds:

ALTER TABLE MRSAT ADD CONSTRAINT X_MRSAT_PK PRIMARY KEY BTREE (ATUI)

Error Code: 2013. Lost connection to MySQL server during query

My only use for the MySQL database is converting the relational rows to RDF triples. This conversion is performed by a single python script, which does seem to access the MRSAT table, but doesn't appear to use the ATUI column. At this point, I have extracted almost all of the data I want.

How can I tell if the absence of the primary key is detrimental to the performance of the RDF-generation queries?

I have increased some timeouts but haven't made all of the changes in suggested in other answers to that question.

The documentation from the provider suggests MySQL 5.5 over 5.6 due to disk space usage issues. I am using 5.6 anyway (as I have done in the past) on a generous AWS x1e.2xlarge instance running Ubuntu 18.

The documentation provides tuning suggestions for 5.5, but I don't see equivalent settings names in the 5.6 documentation. I have applied these:

bulk_insert_buffer_size = 100M
join_buffer_size = 100M
myisam_sort_buffer_size = 200M
query_cache_limit = 3M
query_cache_size = 100M
read_buffer_size = 200M
sort_buffer_size = 500M

For key_buffer = 600M I did key_buffer_size= 600M. I didn't do anything for table_cache = 300

The primary key is supposed to be set on the alphanumerical column ATUI

mysql> select * from MRSAT limit 9;
+----------+----------+----------+-----------+-------+---------+-------------+-------+--------+-----+------------+----------+------+
| CUI      | LUI      | SUI      | METAUI    | STYPE | CODE    | ATUI        | SATUI | ATN    | SAB | ATV        | SUPPRESS | CVF  |
+----------+----------+----------+-----------+-------+---------+-------------+-------+--------+-----+------------+----------+------+
| C0000005 | L0000005 | S0007492 | A26634265 | AUI   | D012711 | AT212456753 | NULL  | TH     | MSH | UNK (19XX) | N        | NULL |
| C0000005 | L0000005 | S0007492 | A26634265 | AUI   | D012711 | AT212480766 | NULL  | TERMUI | MSH | T037573    | N        | NULL |
| C0000005 | L0000005 | S0007492 | A26634265 | SCUI  | D012711 | AT60774257  | NULL  | RN     | MSH | 0          | N        | NULL |
| C0000005 | L0270109 | S0007491 | A26634266 | AUI   | D012711 | AT212327137 | NULL  | TERMUI | MSH | T037574    | N        | NULL |
| C0000005 | L0270109 | S0007491 | A26634266 | AUI   | D012711 | AT212456754 | NULL  | TH     | MSH | UNK (19XX) | N        | NULL |
| C0000005 | NULL     | NULL     | NULL      | CUI   | NULL    | AT00368929  | NULL  | DA     | MTH | 19900930   | N        | NULL |
| C0000005 | NULL     | NULL     | NULL      | CUI   | NULL    | AT01344283  | NULL  | MR     | MTH | 20020910   | N        | NULL |
| C0000005 | NULL     | NULL     | NULL      | CUI   | NULL    | AT02319637  | NULL  | ST     | MTH | R          | N        | NULL |
| C0000039 | L0000035 | S0007560 | A26674543 | AUI   | D015060 | AT212481191 | NULL  | TH     | MSH | UNK (19XX) | N        | NULL |
+----------+----------+----------+-----------+-------+---------+-------------+-------+--------+-----+------------+----------+------+
Mark Miller
  • 3,011
  • 1
  • 14
  • 34

0 Answers0