3

I have a MySQL table with over 16 million rows and there is no primary key. Whenever I try to add one, my connection crashes. I have tried adding one as an auto increment in PHPMyAdmin and in shell but the connection is always lost after about 10 minutes.

What I would like to do is loop through the table's rows in PHP so I can limit the number of results and with each returned row add an auto-incremented ID number. Since the number of impacted rows would be reduced by reducing the load on the MySQL query, I won't lose my connection.

I want to do something like

    SELECT * FROM MYTABLE LIMIT 1000001, 2000000;

Then, in the loop, update the current row

    UPDATE (current row) SET ID='$i++'

How do I do this?

Note: the original data was given to me as a txt file. I don't know if there are duplicates but I cannot eliminate any rows. Also, no rows will be added. This table is going to be used only for querying purposes. When I have added indexes, however, there were no problems.

tcam27
  • 95
  • 7
  • 5
    No primary key rarely is a good thing. Generally it is a bad idea, specially with 16,000,000 of records... You don't have any unique field to convert in primary key? This can be hard, anyway. – fusion3k Mar 07 '16 at 15:21
  • Yes agree with @fusion3k. Don't you have any other field which exposes uniqueness among the rows? – 1000111 Mar 07 '16 at 15:22
  • Ensure one thing do you have duplicate rows? [Duplicate means all the columns of different rows have same values in each other] @tcam27 – 1000111 Mar 07 '16 at 15:28
  • 2
    Do you have access to your database server? I think it would be much better to run a command to edit table to add an ID column and make it an auto increment – Ronald Nsabiyera Mar 07 '16 at 15:29
  • So if you get this hack-job PHP script to add unique IDs to your rows, what's going to happen when row #16,000,001 is inserted? If MySQL is not auto-incrementing then you are gonna have a bad time. – MonkeyZeus Mar 07 '16 at 15:57
  • The database table was given to me as a text file by a 3rd party and they were not very good about adding IDs.They had one but it was filled with duplicates. – tcam27 Mar 07 '16 at 17:10
  • I have shell access and I lose the connection after trying to add an ID column with an auto increment after a few minutes. MySQL gives me an error if I don't also make that auto_increment the primary key – tcam27 Mar 07 '16 at 17:12
  • I should note that no data will be added to this table. This is only for querying purposes. – tcam27 Mar 07 '16 at 17:18

2 Answers2

1

I suspect you are trying to use phpmyadmin to add the index. As handy as it is, it is a PHP script and is limited to the same resources as any PHP script on your server, typically 30-60 seconds run time, and a limited amount of ram.

Suggest you get the mysql query you need to add the index, then use SSH to shell in, and use command line MySQL to add your indexes.

Duane Lortie
  • 1,285
  • 1
  • 12
  • 16
  • I tried adding the primary key via command line but I lose the connection after about 10 minutes. Oddly, I have added indexes via command line MySQL with no problems. – tcam27 Mar 07 '16 at 17:08
  • despite you losing the connection, the MySQL server *should* have continued doing as it was told. 16 million is just a freek'n big data set.. If it is a MyISAM type DB, you'd likely see a performance boost by converting it to INNODB via PHPMyAdmin's operations page. PHPMyAdmin should allow you to start that process, but will of course time out before the the job is finished.. regardless, again, it should complete the job, just make sure the database is not being accessed while it completes. – Duane Lortie Mar 08 '16 at 19:15
  • It is an InnoDB table and nothing else is accessing it as it runs – tcam27 Mar 08 '16 at 21:41
0

If you don't have duplicate rows then the following way might shed some light:

Suppose you want to update the auto incremented value for first 10000 rows.

UPDATE 
MYTABLE

INNER JOIN 

(SELECT 
*,
@rn := @rn + 1 AS row_number
FROM MYTABLE,(SELECT @rn := 0) var
ORDER BY SOME_OF_YOUR_FIELD
LIMIT 0,10000 ) t
ON t.field1 = MYTABLE.field1 AND t.field2 = MYTABLE.field2 AND .... t.fieldN = MYTABLE.fieldN

SET MYTABLE.ID = t.row_number;

For next 10000 rows just need to change two things:

  1. (SELECT @rn := 10000) var
  2. LIMIT 10000,10000

Repeat..

Note: ORDER BY SOME_OF_YOUR_FIELD is important otherwise you would get results in random order. Better create a function which might take limit,offset as parameter and do this job. Since you need to repeat the process.

Explanation:

The idea is to create a temporary table(t) having N number of rows and assigning a unique row number to each of the row. Later make an inner join between your main table MYTABLE and this temporary table t ON matching all the fields and then update the ID field of the corresponding row(in MYTABLE) with the incremented value(in this case row_number).

Another IDEA:

You may use multithreading in PHP to do this job.

  1. Create N threads.
  2. Assign each thread a non overlapping region (1 to 10000, 10001 to 20000 etc) like the above query.

Caution: The query will get slower in higher offset.

Community
  • 1
  • 1
1000111
  • 13,169
  • 2
  • 28
  • 37
  • I don't know if there are duplicate rows. The data was given to me as a txt file by a 3rd party. – tcam27 Mar 07 '16 at 17:15
  • No problem. If duplicate rows exist later you can remove them after doing this job. @tcam27 – 1000111 Mar 07 '16 at 17:19