I am a novice what concerns database design and stumbled across the following problem (for a scientific database findstat.org):
I have a list of some hundreds of "combinatorial maps" (specified by an integer), each of which is specified by a list of some ten-thousands of preimage-image pairs, both of which are just integers. The preimages are in fact consecutive integers.
(EDIT) The table is only read and written as a whole. There is never ever need to access only a few values. More precisely, the "natural" unit is the data associated with one "combinatorial map". (END OF EDIT)
The current design, which sort of works, is as follows:
The table has (essentially) two columns: the "ID of the combinatorial map" (an INT) and "values" which is a LONGTEXT - the images of 1,2,..., separated by "|".
I was told that this is not really proper design, and tried the following variation. The new table has three columns: the "ID of the combinatorial map", "preimage" and "image".
This new table thus has many many many rows.
QUESTIONS:
Looking at the memory consumption, it appears that the new table uses much more memory. Is this to be expected? (EDIT: as it turns out, I am roughly doubling the memory cunsumption. This is rather surprising for me, because in the new design, I store integers as INT, whereas in the old design I store integers as text. A blog post by Mark Callaghan from 2009 mentions that the overhead is roughly 20 Bytes per row, which is about 24MB for a table of 1 250 000 rows. I would have expected that the overhead of storing integers as text is much larger, but I was mistaken. END OF EDIT)
I populate this table starting with the empty table using python and sqlalchemy. For my application, it seems right to first add all rows, and then commit. While this is unproblematic for the first design, my computer "ran out of memory" - that is, it started swapping until it became completely unresponsive. Is it likely that I did something wrong, or is the "new" design simply bad?