6

Evening,

I'm going through the long process of importing data from a battered, 15-year-old, read-only data format into MySQL to build some smaller statistical tables from it.

The largest table I have built before was (I think) 32 million rows, but I didn't expect it to get that big and was really straining MySQL.

The table will look like this:

surname    name    year    rel    bco    bplace    rco    rplace
Jones      David   1812    head   Lond   Soho      Shop   Shewsbury

So, small ints and varchars.

Could anyone offer advice on how to get this to workas quickly as possible? Would indexes on any of the coulmns help, or would they just slow queries down.

Much of the data in each column will be duplicated many times. Some fields don't have much more than about 100 different possible values.

The main columns I will be querying the table on are: surname, name, rco, rplace.

Kohjah Breese
  • 4,008
  • 6
  • 32
  • 48
  • 2
    Indexes speed up data retrieval and slow down data insertion. Generally speaking, they're a very good thing: I'd suggest creating them for each of the columns on which you're likely to query the table. – eggyal May 30 '12 at 23:25
  • 3
    30 million rows isn't very many for a modern RDBMS. Unless you're doing full table scans with no indexes, of course. – Greg Hewgill May 30 '12 at 23:26
  • Thanks for you comment. I think that was one of the problems with my 32 mil. row table. Inserting rows was causing crashes. – Kohjah Breese May 30 '12 at 23:26
  • 1
    Indexes will really help; their raison d'être is to *speed queries up*, **not** slow them down! Inserting the index at first will take a while as it will need to collect the values for the entire table, but after that you should find queries are a lot faster. – Asherah May 30 '12 at 23:28
  • 1
    hmm...are you having trouble inserting or querying a large dataset or both? – Skyrim May 31 '12 at 07:06
  • Neither. This was just q preliminary question. – Kohjah Breese May 31 '12 at 18:10

1 Answers1

5

INDEX on a column fastens the search. Try to INDEX columns that you would be using more often in queries. As you have mentioned you would be using the columns surname, name, rco, rplace. I'd suggest you index them.

Since the table has 32 million records, indexing would take sometime however it is worth the wait.

JHS
  • 7,761
  • 2
  • 29
  • 53
  • Thanks. I am just building the table now. So I can build the indexes on insert as opposed to on the c. 30 mil rows. – Kohjah Breese May 31 '12 at 16:25
  • 1
    @KohjahBreese: Check both ways, you may find that creating the index *once* on 30 million rows is faster than building the index incrementally. – Greg Hewgill Jun 01 '12 at 01:04