Innodb table is corrupt when doing mysqlcheck but mysql server does not crash now or upon restart

Question

I have a database that has InnoDB tables in it, and one of the InnoDB tables is marked as corrupt (and I know data is missing, etc.). However, when I restart MySQL, it doesn't crash.

I expected it to crash, however it doesn't . ( I read before that if innodb table is corrupted, mysql server will be stopped )

should it not be crashing now?

innodb is my default db engine.

What is the specific question you have? "Is this ok?" "How do I get the table to not return as corrupt in mysqlcheck?" Or something else? — Nathaniel Ford, Nov 06 '12 at 19:38
I expected it to crash, my question is why mysql server is not crashing while innodb table is marked as corrupt? — Serenade, Nov 06 '12 at 20:52

score 2 · Answer 1 · edited May 23 '17 at 12:25

A corrupt table doesn't necessarily cause a crash. You ought to repair the table and, if possible, reload the table from a backup, though. Operation on a corrupt table is flaky at best, and anyway it's wont to not give you the correct results, as you have already discovered.

Do not trust the fact that the system is not "exploding" -- a database has several intermediate states. The one you're in now could well be "I'm not exploding yet, I'm waiting for the corruption to spread and contaminate other tables' data". If you know the table is corrupt, act now.

About repairing InnoDB tables, see How do I repair an InnoDB table? .

To verify if an InnoDB table is corrupt, see https://dba.stackexchange.com/questions/6191/how-do-you-identify-innodb-table-corruption .

Detecting corruption

To do this you need an acceptance test that will examine a bunch of data and give it a clean bill of health -- or not. Exporting a table to SQL and seeing whether it's possible, and/or running checks on tuple cardinality and/or relations and... you get my drift. On a table where no one is expected to write, so that any modification equals to a corruption, a MD5 of the disk file could be quicker.

To make things more efficient (e.g. in production systems) you can think file snapshots, or database replication, or even High Availability. These methods will detect programmatic corruption (e.g. a rogue UPDATE), but may not detect some kinds of hardware corruption on the master (giving a false negative: the checks on the slave pan out, and the data is still corrupt on the master) or may suffer mishaps in the slave (which fails and raises a false positive, since the data on the master is actually untainted).

It is important (and efficient) to monitor system vital statistics, both to catch the first symptoms of an impending failure (e.g. with SMART) and to supply data for forensic investigation ("Funny that every time the DB failed it was always shortly after a sudden peak in system load -- what if we ferreted out what caused that?").

And of course rely on full and adequate backups (and run a test restore every now and then. Been there, done that, got my ass handed to me).

Corruption causes [not related to original question]

Corruption source varies with the software setup. In general, of course, something must intrude in the server memory representation-writer process-OS handle-journaling-IOSS-OS cache-disk-disk cache-internal file layout chain and wreak havoc.

Improper system shutdown may mess at several levels, preventing data from being written at any stage of the pipeline.

Manhandling the files on disk messes with the very last stage (using a pipeline of its own, of which the server knows nothing).

Other more esotheric possibilities exist:

subtle firmware/hardware failure in the hard disk itself,
- accidental and probably unrecoverable, due to disk wear and tear or defective firmware or even a defective firmware update (I seem to remember some years back, a Hitachi update for acoustic management that could be run against a slightly different model. After the update the disk "thought" it had more cache than it actually had, and writes to the nonexistent areas of the cache of course went directly to bit heaven).
- "intentional" and probably recoverable: it is sometimes possible to stretch your hard disk too thin using hdparm. Setting the disk for the very top performance is all well and good if every component is suited to that level of performance and knows it or at least is able to signal if it is not. Sometimes all the "warning" you get is a system malfunction.
process space or IOSS corruption: saw this on a Apache installation where somehow, probably thanks to a CGI that was suid root, the access.log file was filling with a stream of GIF images supposed to go to the user's browser. Fixed and nothing happened, but if it had been a more vital file instead of a log...? Such problems may be difficult to diagnose, and you might need to inspect all log files to see whether some application noticed or did anything untoward.
hard disk sector relocation: fabled to happen, never seen it myself, but modern hard disks have "spare" sectors they will swap for defective sectors to keep sporting a "zero defect" surface. Except that if the defective sector happens to no longer be readable and is swapped for an empty one, the net effect is the same as that sector suddenly being zeroed. This you can easily check using SMART reporting (hddhealth or smartctl).

Many more other possibilities exist, of course, depending on setup. Googling for file corruption finds a jillion pages; useful terms to add to the query are filesystem (ext4, NTFS, brtfs, ...), hard disk make and model, OS, software suffering problems, other software installed.

What I need to know is, whether there is any other mechanism to find this corruption? if I have a database in production, and I want to monitor it to detect corruption as early as possible, what should I do? ( since based on my experience despite the table's corruption, mysql server is running fine) If I have a daemon running to monitor the health of the database, what command should it generate? mysqlcheck locks the tables, so doesn't look like the most efficient in production mode, Am I right? — Serenade, Nov 06 '12 at 21:34
If corruption happens in a page that is then read, the server will stop. If it happens elsewhere, it's a problem because the quickest and surest way of detecting the problem is to trigger it, by reading the data. I thought your problem was that the table had been recovered after a crash, but had lost some data, and integrity was shot. If it happens often, double-check your hardware. If you have an "unstable" table, dump it to a .SQL file, inspect as much as possible, drop and reimport. But if you have a backup at hand, don't hesitate to use that. — LSerni, Nov 06 '12 at 21:54
@iserni, If you do not touch the internal mysql files, neither do I have a power shutdown, How does corruption happen anyway? — Pacerier, Mar 19 '15 at 08:18

Innodb table is corrupt when doing mysqlcheck but mysql server does not crash now or upon restart

1 Answers1

Detecting corruption

Corruption causes [not related to original question]