0

The MRG table contains 1B+ rows, spread over several smaller tables. Additional observations are added to the MRG table, which adds them to the last underlying table.

For some reason, there appears to be a couple of duplicate rows sharing the same primary key. I know that this is a limitation of the MRG tables and it is not really problematic in our case (it simply means useless redundancy). We believe that the duplicates were inserted following a system crash, such that the reprocessing wasn't properly handled by the application's code that deletes duplicates from a temporary table prior to insertion in the archive.

This being said, is there a set of best practices to maintain uniqueness that will not require too much processing time?

user3127882
  • 482
  • 5
  • 12
  • I doubt if your question will get much traction. I last used MRG 15 years ago, and have not heard of anyone using it since then. MyISAM is going away, and MRG with it. `PARTITIONing` is mostly a replacement for it; but even it has little use. Why are you using MRG instead of one giant table? – Rick James Nov 18 '17 at 13:42
  • Once a day several millions of rows are updated. Using a collection of smaller tables allows parallelization. Sticking to myIsam because it allows much faster and compact backups. But yes, most likely switch to partitioned innodb in the not too distant future. In the meantime using a unified table with the primary key to enforce uniqueness. Not ideal but does the job – user3127882 Nov 18 '17 at 15:12
  • MyISAM handles PARTITIONing, hence uniqueness checking. – Rick James Nov 18 '17 at 17:08
  • Can you parallelize a process on a partitioned table? with the MRG framework, easy to issue distinct UPDATE commands, one for each underlying table. Not clear at all that this can be done with partitions... – user3127882 Nov 18 '17 at 23:57
  • With newer versions of MySQL, you can specify the partition in the statement. What version do you have? However, it is not clear that you get any benefit. If you are already splitting into multiple threads, it may work without any attempt to specify which partition. Even with a billion rows, I would not necessarily partition (or `MERGE`) for performance. – Rick James Nov 19 '17 at 14:19

1 Answers1

0

You will always be at risk of duplicates.

https://dev.mysql.com/doc/refman/5.7/en/merge-storage-engine.html says:

...a MERGE table cannot enforce uniqueness over the set of underlying tables.

Don't use MERGE tables. MERGE only works with MyISAM, and you shouldn't be using MyISAM.

Instead, use PARTITIONING. You can enforce uniqueness across all partitions for a given table.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828