大表分页查询调研

当一个表数据量特别大的时候,使用分页limit操作,比如limit 10000000,5 会导致查询特别缓慢。
究竟是什么原因导致的呢?怎么去解决这个问题呢?

一、直接查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
mysql> select * from user_status_record_0 limit 5,5;
5 rows in set (0.01 sec)

mysql> select * from user_status_record_0 limit 500000,5;
5 rows in set (0.52 sec)

mysql> select * from user_status_record_0 limit 5000000,5;
5 rows in set (5.19 sec)

mysql> select * from user_status_record_0 limit 4900000,5;
5 rows in set (1.56 sec)

mysql> select * from user_status_record_0 order by id limit 5000010,5;
5 rows in set (1.59 sec)

mysql> select * from user_status_record_0 limit 9900000,5;
5 rows in set (6.06 sec)

mysql> explain select * from user_status_record_0 limit 5000000,5;
+----+-------------+----------------------+------+---------------+------+---------+------+----------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+------+---------------+------+---------+------+----------+-------+
| 1 | SIMPLE | user_status_record_0 | ALL | NULL | NULL | NULL | NULL | 28188569 | |
+----+-------------+----------------------+------+---------------+------+---------+------+----------+-------+
1 row in set (0.00 sec)

说明:

  • limit5000000,5 没有把所有的数据都查询出来,只是查询到5000005个。
  • 缓存也是只是缓存查询出来的数据。
  • 查询9900000的时候,数据走了缓存。
  • 没有加索引,需要全表查询,这个时候explain的rows为全表行数。

二、加上order by id

1
2
3
4
5
6
7
8
9
10
11
12
13
mysql> select * from user_status_record_1 order by id limit 5,5;
5 rows in set (0.01 sec)

mysql> select * from user_status_record_1 order by id limit 5000000,5;
5 rows in set (5.15 sec)

mysql> explain select * from user_status_record_1 order by id limit 5000000,5;
+----+-------------+----------------------+-------+---------------+---------+---------+------+---------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+-------+---------------+---------+---------+------+---------+-------+
| 1 | SIMPLE | user_status_record_1 | index | NULL | PRIMARY | 8 | NULL | 5000005 | |
+----+-------------+----------------------+-------+---------------+---------+---------+------+---------+-------+
1 row in set (0.01 sec)

说明:

  • 加不加order by id其实效率并没有改变。
  • 虽然走了索引,但是其实还是全表查询。
  • 因为加上了索引,所以进行优化后,explain的rows进行了减少。

三、加上子查询

3.1、在user_status_record表中实验

1
2
3
4
5
6
7
8
9
10
11
12
13
14
mysql> select * from user_status_record_13 limit 9000000,5;
5 rows in set (10.19 sec)

mysql> select * from user_status_record_14 where id > (select id from user_status_record_14 limit 9000000,1) limit 5;
5 rows in set (48.89 sec)

mysql> explain select * from user_status_record_14 where id > (select id from user_status_record_14 limit 9000000,1) limit 5;
+----+-------------+-----------------------+-------+---------------+-------------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------------+-------+---------------+-------------+---------+------+----------+-------------+
| 1 | PRIMARY | user_status_record_14 | range | PRIMARY | PRIMARY | 8 | NULL | 13674948 | Using where |
| 2 | SUBQUERY | user_status_record_14 | index | NULL | idx_pass_id | 8 | NULL | 27349896 | Using index |
+----+-------------+-----------------------+-------+---------------+-------------+---------+------+----------+-------------+
2 rows in set (1.25 sec)

3.2、说明

  • 使用了覆盖索引,结果速度更慢了。

3.3、在antispam表实验

1
2
3
4
5
6
CREATE TABLE `anti_health_rate_43` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键',
`create_day` int(10) unsigned NOT NULL DEFAULT '0' COMMENT '创建日期',
PRIMARY KEY (`id`),
KEY `idx_create_day` (`create_day`)
) ENGINE=InnoDB AUTO_INCREMENT=2945980 DEFAULT CHARSET=utf8 COMMENT='用户健康度记录表';
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
mysql>  select * from anti_health_rate_11 limit 2000010, 1;
1 row in set (5.32 sec)

mysql> select * from anti_health_rate_12 order by id limit 2000010, 1;
1 row in set (4.57 sec)

mysql> select * from anti_health_rate_13 order by create_day limit 2000010, 1;
1 row in set (10.06 sec)

mysql> SELECT * FROM anti_health_rate_14 WHERE id >=(select id from anti_health_rate_14 limit 2000010, 1) limit 5;
5 rows in set (0.89 sec)

mysql> explain SELECT * FROM anti_health_rate_43 WHERE id >=(select id from anti_health_rate_43 limit 2000010, 1) limit 5;
+----+-------------+---------------------+-------+---------------+----------------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------+-------+---------------+----------------+---------+------+---------+-------------+
| 1 | PRIMARY | anti_health_rate_43 | range | PRIMARY | PRIMARY | 8 | NULL | 1779770 | Using where |
| 2 | SUBQUERY | anti_health_rate_43 | index | NULL | idx_create_day | 4 | NULL | 3559541 | Using index |
+----+-------------+---------------------+-------+---------------+----------------+---------+------+---------+-------------+
2 rows in set (0.27 sec)

3.4 说明

  • select id from anti_health_rate_14 limit 2000010, 1使用了覆盖索引,覆盖索引因为需要的页更少,叶子节点数据更少,所以直接速度更快。
  • 辅助索引对应的id的排序有关,加载更少的页,磁盘IO操作更少(聚集索引的叶子节点包含一行数据,辅助索引叶子节点包含主键ID)。
  • 但是有问题,子查询是按照create_day分页,主查询是按照id分页。

四、between

1
2
3
4
5
6
7
8
mysql> select * from user_status_record_6 where id between 9000010 and 9000015;
+---------+------------+----------------+------+-----------------------------+-------------+----------+
| id | pass_id | order_id | type | type_value | create_time | category |
+---------+------------+----------------+------+-----------------------------+-------------+----------+
| 9000010 | 1440289806 | 14703922231285 | 16 | 38:BC:1A:B0:5C:721694573069 | 1470392233 | 1 |
| 9000011 | 2219729906 | 14703922265897 | 21 | 1470344300#1580198378 | 1470392235 | 1 |
+---------+------------+----------------+------+-----------------------------+-------------+----------+
2 rows in set (0.01 sec)

说明

  • 速度快,直接使用聚集索引
  • 查询结果不准确,因为可能会有id不是连续的。

五、总结

5.1、方法一:对分页排序的字段建立唯一索引(id)

select * from table where id > (select id from table limit 10,1) limit 10

5.2、代码层实现,通过两次查询。

1
2
$ret = select id from table order by aaa limit 100,10
select * from table where id in ( $ret );
文章目录
  1. 1. 一、直接查询
    1. 1.0.1. 说明:
  • 2. 二、加上order by id
    1. 2.0.1. 说明:
  • 3. 三、加上子查询
    1. 3.0.1. 3.1、在user_status_record表中实验
    2. 3.0.2. 3.2、说明
    3. 3.0.3. 3.3、在antispam表实验
    4. 3.0.4. 3.4 说明
  • 4. 四、between
    1. 4.0.1. 说明
  • 5. 五、总结
    1. 5.0.1. 5.1、方法一:对分页排序的字段建立唯一索引(id)
    2. 5.0.2. 5.2、代码层实现,通过两次查询。
  • ,