1

I'm storing database dumps into a MySQL table, so, I intend to store a maximum of 10 different dumps. The 11th different dump and on should have data set to NULL, to save space but keep some info available.

Same dumps have the same drive_md5.

 CREATE TABLE `dumps` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`db_id` int(11) NOT NULL,
`drive_id` char(28) DEFAULT NULL,
`drive_md5` char(32) DEFAULT NULL,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`data` longblob,
`filesize` mediumint(8) unsigned DEFAULT NULL,
 PRIMARY KEY (`id`),
 KEY `drive_md5` (`drive_md5`),
 KEY `db_id` (`db_id`)
) ENGINE=MyISAM AUTO_INCREMENT=43 DEFAULT CHARSET=utf8

If you need more info, please leave a comment so I can edit the question. Thanks

Edit: I'm accepting a clearer title's suggestion.

enapupe
  • 15,691
  • 3
  • 29
  • 45

2 Answers2

1

To clear the data from all excepts the 10 last record

UPDATE  dumps
SET     data = null 
WHERE   id NOT IN (SELECT TOP 10 Id from dumps ORDER BY Id DESC )

This does not work when you are looking for the last 10 per column. In this case md5. You can achieve this using the row count function. In order to do this you need a query that counts the rows per MD5. Example below.

SELECT 
id         AS Id_Of_Records_To_Clean,
drive_md5  AS drive_md5_for_Reference
FROM
(
/*Sub query numbering the entries in the dumps table, from 1 to n per drive_m5*/
SELECT TOP 100 PERCENT
ROW_NUMBER() OVER(PARTITION BY drive_md5  ORDER BY drive_md5 , id DESC) AS RowNum,
drive_md5 , Id
ORDER BY drive_md5 , id DESC
) AS RecordsToClearOrDelete
WHERE
RowNum > 10

The query above will retun the record ID's (PRIMARY KEY) of the records to clean. You can then use a query to UPDATE 'data' where ID in (....Query above...)

See - ROW OVER with PARTITION for MySQL

Hope I have understood , and hope this help.

Community
  • 1
  • 1
Richard Vivian
  • 1,700
  • 1
  • 14
  • 19
  • What about group by and offset? group by db_id, drive_md5 limit 10,x? I still haven't figured a way – enapupe May 11 '14 at 20:24
  • If you need the last X items by db_id and drive_md5, then the row over needs to partition by db_id , drive_md5. This makes the row number in the inner query reset each break in db_id , drive_md5. That should work if I understand. Not sure about offset? – Richard Vivian May 11 '14 at 20:28
  • I think it helps but I couldn't understand the whole thing.. I have these database ROWS refereeing database dumps. I want to keep 10 dumps that have DATA for each DATABASE. That is it. A db may have only a sequence of 10 dumps with data. Or a interval with N rows without data, but the sum of rows with data per database must be 10. – enapupe May 11 '14 at 21:04
  • As a test select query, use the query posted above, but use RowNum < 10. This will show you the records that you should keep per database. Essentially the ROW_NUMBER() function will allow you to create an ordered number . The Partition allows you to reset the ROW_NUMBER() to 1 for each change in partition. The order by allows you to number Increasing or decreasing. All together you can then get the Last 10 or first 10 records for each DB, each Drive, or any combination of columns based on the partition. – Richard Vivian May 11 '14 at 21:15
  • Is select top 100 percent a MySQL thing? I'm having sql error: http://paste.debian.net/98921/ – enapupe May 11 '14 at 21:23
  • I'm considering running two loops: `select id from dumps where data is not null and db_id = X limit 10,100` to fetch the IDS and another query to update data set data = null. – enapupe May 11 '14 at 21:44
  • The top 100 percent is needed when you have an order by in the inner query. You may be able to leave the TOP 100 PERCENT out, but then you will have to remove the ORDER BY. The ROW_NUMBER's ORDER BY should be adequate. – Richard Vivian May 12 '14 at 05:12
  • I couldn't do it. I'm voting up and choosing as correct answer anyhow, thanks for your attention. – enapupe May 12 '14 at 14:12
  • Thanks for the up-mark. If you post some sample data with expected results I can try to improve the query. – Richard Vivian May 12 '14 at 14:16
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/52528/discussion-between-richard-vivian-and-enapupe) – Richard Vivian May 12 '14 at 14:17
0

As pointed in the commentaries, the best way to accomplish this was before insert, not after. So, I'm running a md5 algorithm in the data and checking against database where md5 = md5, data is NOT null and db = db. So, if there is a record WITH data and the same db/md5, there is no need to feed the data column.

Now, I just need a cleanup query, where it should empty DATA after the 10th row where data is not null.

enapupe
  • 15,691
  • 3
  • 29
  • 45