2

I'm trying to select every n-th row from mysql, I read this answer.

There is a table sys_request_log:

CREATE TABLE `sys_request_log` 
(
    `id` bigint(20) NOT NULL,
    `user_id` bigint(20) DEFAULT NULL,
    `ip` varchar(50) DEFAULT NULL,
    `data` mediumtext,
    `create_time` datetime DEFAULT NULL,
    PRIMARY KEY (`id`) USING BTREE,
    KEY `user_id` (`user_id`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;

It contains 11837 rows.

I try to select every 5-th row from table, first I try to execute:

SELECT
    * 
FROM
    (SELECT @ROW := @ROW + 1 AS rownum, log.* FROM ( SELECT @ROW := 0 ) r, sys_request_log log ) ranked 
WHERE
    rownum % 5 = 1 

The result is:

rownum  id                  user_id             create_time
-------------------------------------------------------------------
 1      1271446699071639552 1                   2020-06-12 22:18:10
 6      1271446948980854784 1                   2020-06-12 22:19:10
11      1271447016878247936 1269884071484461056 2020-06-12 22:19:26

It costs 1.001s time

I found there is a unrelated column rownum. So I modify the SQL like this:

SELECT
    log.*
FROM
    (SELECT @ROW := @ROW + 1 AS rownum FROM (SELECT @ROW := 0) t) r,
    sys_request_log log
WHERE
    rownum % 5 = 1

Now the result is clean (no rownum), but It costs 2.516s time!

Why?

Mysql version: 5.7.26-log

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Julius Wang
  • 361
  • 1
  • 6
  • 14
  • Just a remark: There is no inherent order in a table. Rows are considered an unordered set of data. So, you get 20% of the table's rows alright, which is every 5th row, so to say, but in no particular order. You could get the first inserted 20% or the last inserted 20% or any other rows that make up 20%. It depends on how the DBMS has stored the data and accesses it. If you want an order, use `ORDER BY`. – Thorsten Kettner Jul 30 '20 at 08:18
  • Why have you tagged MariaDB? It looks like you are using MySQL 5.7.26. (It's worth upgrading to MySQL 8, by the way, where it is much easier to get each nth row with `ROW_NUMBER` - and in a standard-compliant way at that.) – Thorsten Kettner Jul 30 '20 at 08:21
  • @JuliusWang . . . I don't think the second query does what you want. – Gordon Linoff Jul 30 '20 at 12:11
  • @GordonLinoff, It's just remove `rownum` column, It works. – Julius Wang Jul 31 '20 at 04:02
  • @ThorstenKettner Sorry for that. I've removed the tag. I'm not DBA,so I can't upgrade mysql. – Julius Wang Jul 31 '20 at 04:04

3 Answers3

3

In the first case, the row number values are selected during the selection from the table(sys_request_log), but for the second case there occurs a cartesian product among subquery r and the selection from the table because of the CROSS JOIN occurence for each individual rownum versus each individual row value of the table.

Barbaros Özhan
  • 59,113
  • 10
  • 31
  • 55
1

If I understand correctly, you can do what you want by moving the variable assignment to the where clause:

select srl.*
from sys_request_log srl cross join
     (select @rn := 0) params
where (@rn := (@rn + 1)) % 5 = 1;

Note this happens to work in this case, because the query needs to do a full table scan and run the WHERE clause on each row. It might not work if the query has a JOIN, GROUP BY or ORDR BY.

The use of variables in this way is deprecated in MySQL now. You should upgrade and learn about window functions.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

Your second query has different result from the first one and returns all rows, so take much more time from the first one.

To remove rownum from first query, Just name fields in SELECT clause.

try this:

SELECT
    ranked.id, ranked.user_id ,ranked.create_time
FROM
    ( SELECT @ROW := @ROW + 1 AS rownum, log.* FROM ( SELECT @ROW := 0 ) r, sys_request_log log ) ranked 
WHERE
    rownum % 5 = 1
Majid Hajibaba
  • 3,105
  • 6
  • 23
  • 55
  • yes. But in my opinion, I don't think an extra temporary table is necessery. – Julius Wang Jul 31 '20 at 04:44
  • There is no extra temporary table. Check the result of your second query. It returns all rows and is different from the first one which returns just every 5th record. I test your example in a table with 11 records. The first one return 3 and the second one return 11 records. So it takes more time. I just change your first query to cover your desired output. – Majid Hajibaba Jul 31 '20 at 05:50
  • you are right, I've tested. But why? I think mysql is make cartesian product first, so it is `1*n` line, which 1 is (select @row:=0) and n is table rows, then return 3 result by `where` condition. Why it return all results? Did I get it wrong? – Julius Wang Jul 31 '20 at 06:39
  • Use * instead of log.* to see your fault. – Majid Hajibaba Jul 31 '20 at 06:55