A corrupt table doesn't necessarily cause a crash. You ought to repair the table and, if possible, reload the table from a backup, though. Operation on a corrupt table is flaky at best, and anyway it's wont to not give you the correct results, as you have already discovered.
Do not trust the fact that the system is not "exploding" -- a database has several intermediate states. The one you're in now could well be "I'm not exploding yet, I'm waiting for the corruption to spread and contaminate other tables' data". If you know the table is corrupt, act now.
About repairing InnoDB tables, see How do I repair an InnoDB table? .
To verify if an InnoDB table is corrupt, see https://dba.stackexchange.com/questions/6191/how-do-you-identify-innodb-table-corruption .
Detecting corruption
To do this you need an acceptance test that will examine a bunch of data and give it a clean bill of health -- or not. Exporting a table to SQL and seeing whether it's possible, and/or running checks on tuple cardinality and/or relations and... you get my drift.
On a table where no one is expected to write, so that any modification equals to a corruption, a MD5 of the disk file could be quicker.
To make things more efficient (e.g. in production systems) you can think file snapshots, or database replication, or even High Availability. These methods will detect programmatic corruption (e.g. a rogue UPDATE), but may not detect some kinds of hardware corruption on the master (giving a false negative: the checks on the slave pan out, and the data is still corrupt on the master) or may suffer mishaps in the slave (which fails and raises a false positive, since the data on the master is actually untainted).
It is important (and efficient) to monitor system vital statistics, both to catch the first symptoms of an impending failure (e.g. with SMART) and to supply data for forensic investigation ("Funny that every time the DB failed it was always shortly after a sudden peak in system load -- what if we ferreted out what caused that?").
And of course rely on full and adequate backups (and run a test restore every now and then. Been there, done that, got my ass handed to me).
Corruption causes [not related to original question]
Corruption source varies with the software setup. In general, of course, something must intrude in the server memory representation-writer process-OS handle-journaling-IOSS-OS cache-disk-disk cache-internal file layout
chain and wreak havoc.
Improper system shutdown may mess at several levels, preventing data from being written at any stage of the pipeline.
Manhandling the files on disk messes with the very last stage (using a pipeline of its own, of which the server knows nothing).
Other more esotheric possibilities exist:
- subtle firmware/hardware failure in the hard disk itself,
- accidental and probably unrecoverable, due to disk wear and tear or defective firmware or even a defective firmware update (I seem to remember some years back, a Hitachi update for acoustic management that could be run against a slightly different model. After the update the disk "thought" it had more cache than it actually had, and writes to the nonexistent areas of the cache of course went directly to bit heaven).
- "intentional" and probably recoverable: it is sometimes possible to stretch your hard disk too thin using
hdparm
. Setting the disk for the very top performance is all well and good if every component is suited to that level of performance and knows it or at least is able to signal if it is not. Sometimes all the "warning" you get is a system malfunction.
- process space or IOSS corruption: saw this on a Apache installation where somehow, probably thanks to a CGI that was suid root, the access.log file was filling with a stream of GIF images supposed to go to the user's browser. Fixed and nothing happened, but if it had been a more vital file instead of a log...? Such problems may be difficult to diagnose, and you might need to inspect all log files to see whether some application noticed or did anything untoward.
- hard disk sector relocation: fabled to happen, never seen it myself, but modern hard disks have "spare" sectors they will swap for defective sectors to keep sporting a "zero defect" surface. Except that if the defective sector happens to no longer be readable and is swapped for an empty one, the net effect is the same as that sector suddenly being zeroed. This you can easily check using SMART reporting (
hddhealth
or smartctl
).
Many more other possibilities exist, of course, depending on setup. Googling for file corruption finds a jillion pages; useful terms to add to the query are filesystem (ext4, NTFS, brtfs, ...), hard disk make and model, OS, software suffering problems, other software installed.