为什么pt-osc和gh-osc在拷贝源表数据时要使用insert IGNORE into select lock in share mode

insert IGNORE into select lock in share mode 的作用

pt-osc和gh-osc在拷贝旧数据时逻辑是一样的, 都是用insert ignore into 影子表 select * from 原表force index (PRIMARY) where chunk范围 lock in share mode

1
2
3
4
5
6
7
pt-osc
INSERT LOW_PRIORITY IGNORE INTO `sysbench`.`_sbtest1_new` (`id`, `k`, `c`, `pad`, `snum`) SELECT `id`, `k`, `c`, `pad`, `snum` FROM `sysbench`.`sbtest1` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '98802')) AND ((`id` <= '98901')) LOCK IN SHARE MODE /*pt-online-schema-change 6012 copy nibble*/

gh-ost
insert /* gh-ost `sysbench`.`sbtest1` */ ignore into `sysbench`.`_sbtest1_gho` (`id`, `k`, `c`, `pad`, `snum`)
(select `id`, `k`, `c`, `pad`, `snum` from `sysbench`.`sbtest1` force index (`PRIMARY`)
where (((`id` > _binary'419601')) and ((`id` < _binary'419701') or ((`id` = _binary'419701')))) lock in share mode

为什么需要lock in share mode ?

假设修改表t1, t1有两列 id, sname, id为主键, 数据如下

1
2
3
4
5
6
7
8
id,sname
1,'foo'
10,'foo'
20,'foo'
30,'foo'
40,'foo'
...
1000万,'foo'

开始改表, 需要拷贝原始数据
拷贝原表数据语句A:

1
insert IGNORE into _t1_影子 select * from t1 where id>1 and id<100万;

我们这个insert正在执行,且执行的很慢, 总是就是放大它的执行时间. 此时另一个会话执行下面的语句B

1
2
3
insert into t1 values(11,'foo');
update t1 set sname='hehe' where id=20;
delete from t1 where id=30;

如果是pt-osc ,走触发器, 这些语句会应用到影子表

pt-osc 三个触发器

1
2
3
4
5
>CREATE DEFINER=`fanboshi`@`%` TRIGGER `pt_osc_sysbench_sbtest1_del` AFTER DELETE ON `sysbench`.`sbtest1` FOR EACH ROW DELETE IGNORE FROM `sysbench`.`_sbtest1_new` WHERE `sysbench`.`_sbtest1_new`.`id` <=> OLD.`id`

>CREATE DEFINER=`fanboshi`@`%` TRIGGER `pt_osc_sysbench_sbtest1_upd` AFTER UPDATE ON `sysbench`.`sbtest1` FOR EACH ROW BEGIN DELETE IGNORE FROM `sysbench`.`_sbtest1_new` WHERE !(OLD.`id` <=> NEW.`id`) AND `sysbench`.`_sbtest1_new`.`id` <=> OLD.`id`;REPLACE INTO `sysbench`.`_sbtest1_new` (`id`, `k`, `c`, `pad`, `snum`) VALUES (NEW.`id`, NEW.`k`, NEW.`c`, NEW.`pad`, NEW.`snum`);END

>CREATE DEFINER=`fanboshi`@`%` TRIGGER `pt_osc_sysbench_sbtest1_del` AFTER DELETE ON `sysbench`.`sbtest1` FOR EACH ROW DELETE IGNORE FROM `sysbench`.`_sbtest1_new` WHERE `sysbench`.`_sbtest1_new`.`id` <=> OLD.`id`
1
2
3
REPLACE INTO _t1_影子 values(11,'foo');
REPLACE INTO _t1_影子 values(20,'hehe'); (如果更新了主键值, 则会delete where id=原主键值, replace into 新主键值)
DELETE IGNORE FROM _t1_影子 where id=30;

此时影子表会有如下数据

1
2
11,'foo'
20,'hehe'

之后拷贝原表数据语句A执行完毕, 影子表数据如下

1
2
3
4
5
6
7
8
1,'foo'
10,'foo'
11,'foo' --
20,'hehe' --
30,'foo' --30又回来了
40,'foo'
...
100,'foo'

被删除的id=30又会被插入进来

所以数据就出现了问题, 所以需要insert IGNORE into select lock in share mode, 这样语句B就会被阻塞

这也是为什么pt-osc对update不能像gh-ost一样使用update, 而是需要replace into
数据还是有问题,所以需要lock in share mode

如果是gh-ost, 走binlog, gh-ost解析binlog应用到影子表

https://github.com/github/gh-ost/blob/e48844de0bee9a8db611a06cd6080cac4dab25cb/go/sql/builder.go
insert就是replace into
update还是update (如果更新了主键值, 则会delete where id=原主键值, replace into 新主键值)
delete还是delete

1
2
3
REPLACE INTO _t1_影子 values(11,'foo');
update _t1_影子 set sname='hehe' where id=20; --更新不到
delete from _t1_影子 where id=30; --删除不到

此时影子表会有如下数据

1
11,'foo'

之后拷贝原表数据语句A执行完毕, 影子表数据如下

1
2
3
4
5
6
7
8
1,'foo'
10,'foo'
11, 'foo'
20,'foo' --更新丢了
30,'foo' --删除也丢了
40,'foo'
...
100,'foo'

数据还是有问题, 所以需要lock in share mode

Powered by Hexo and Hexo-theme-hiker

Copyright © 2013 - 2022 Fan() All Rights Reserved.

访客数 : | 访问量 :