i have a big sql table, about 30 GB, i have deleted about half of it. so the information_schema
is not holding the right info (untill DB optimize).
is there a way to get actual size? using full table scan?

- 8,586
- 4
- 28
- 57
-
What are you looking for? The disk utilization? The number of rows? The "free" space? Did you look at `SHOW TABLE STATUS`? What is the value of `innodb_file_per_table`? – Rick James Nov 01 '16 at 23:13
-
looking for the sum of the exact size of each row. in other words, the table size that will be after running optimize – Dima Nov 02 '16 at 21:22
-
That is essentially impossible to predict. Perhaps the best predictor is to watch the value you get and interpolate/extrapolate. – Rick James Nov 03 '16 at 22:59
-
yeah now i know... thanks! maybe ill just create a replica and run optimize there to keep production running – Dima Nov 04 '16 at 12:41
-
btw, i cant extrapulate because i save json blobs and once it was huge, so the new data is much smaller – Dima Nov 04 '16 at 12:42
1 Answers
With InnoDB, many numbers are rather fuzzy. The size of a single row is really not available. SHOW TABLE STATUS
(and an equivalent probe into information_schema
) gives you an estimate. But that estimate can be significantly off -- sometimes more than a factor of 2, high or low.
Here's a brief overview of InnoDB table layout.
The data is stored in a BTree of 16KB blocks, ordered by the PRIMARY KEY
. (I won't discuss secondary indexes, which are in other BTrees.)
Inserting a row into such a structure may find room in the desired block, or it may necessitate a block split. Deleting a row will probably mark part of a block free, and can (rarely) return the block to "free space".
The "avg_row_length" is computed as the disk space minus the "free" blocks, then divided by the number of rows.
But that gets to another fuzzy number. The number of rows is estimated by making a few probes into the BTree to see how many rows per block there are, then doing some computation.
Then the row length is the fuzzy disk space (not accounting for empty space in each block) divided by the fuzzy row count.
I have alluded to "Data_free". But note that inserting/deleting a row, when it does not change the number of blocks, does not change Data_free.
TEXT
columns are (with some caveats, qualifications, and exceptions) stored in separate blocks. The allocation unit there is 16KB blocks. So, if you have any TEXT
or BLOB
columns, the computation gets really messy.
But I am not through... Tiny tables are allocated a few 16KB blocks, but when they get to be even "small", space is allocated 8MB at a time. Again, some of this can be seen in Data_free; much cannot.
The "free" space comes in 3 categories:
- Visible in "Data_free", but not freed up to the OS.
- Reusable space in blocks, as
UPDATEs
andINSERTs
occur. - Invisible overhead. Plan on the space for a table being 2-3 times as much as you would tally by taking the length of each column in each row.
Sorry, you are stuck with imprecise numbers.
Changing topics... Why are you doing big deletes? If you have a sliding time scale (think: news), PARTITIONs
are excellent. If you are replacing all the data, then a RENAME TABLE
trick comes to mind.

- 135,179
- 13
- 127
- 222
-
based on date? or something else? If based on date, consider [_this_](https://mariadb.com/kb/en/mariadb/partition-maintenance/) – Rick James Nov 03 '16 at 22:57