Pt-table-checksum原理浅析

Pt-table-checksum原理浅析

主库建一个表

1
2
3
4
5
node1> create table fan(id int) engine=innodb;
Query OK, 0 rows affected (0.12 sec)
node1> insert into fan values(1);
Query OK, 1 row affected (0.07 sec)

从库制造不一致

1
2
3
4
5
6
7
8
9
node2> select * from fan;
+------+
| id |
+------+
| 1 |
+------+
1 row in set (0.00 sec)
node2> update fan set id=2;

两边打开general log,然后

1
2
3
[root@node1 ~]# pt-table-checksum --nocheck-replication-filters --no-check-binlog-format h=172.16.83.17,u=root,p=mysql,P=3307 --replicate=percona.checksums --recursion-method=dsn=h=172.16.83.21,D=percona,t=dsns --databases=sysbench --tables=fan
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
11-06T05:12:58 0 1 1 1 0 0.344 sysbench.fan

先看两边的checksums表

1
2
3
4
5
6
7
8
9
10
11
12
13
14
node1> select * from `percona`.`checksums` where tbl='fan';
+----------+-----+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+
| db | tbl | chunk | chunk_time | chunk_index | lower_boundary | upper_boundary | this_crc | this_cnt | master_crc | master_cnt | ts |
+----------+-----+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+
| sysbench | fan | 1 | 0.019798 | NULL | NULL | NULL | 42981178 | 1 | 42981178 | 1 | 2017-11-06 05:12:58 |
+----------+-----+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+
node2> select * from `percona`.`checksums` where tbl='fan';
+----------+-----+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+
| db | tbl | chunk | chunk_time | chunk_index | lower_boundary | upper_boundary | this_crc | this_cnt | master_crc | master_cnt | ts |
+----------+-----+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+
| sysbench | fan | 1 | 0.019798 | NULL | NULL | NULL | 40deaf21 | 1 | 42981178 | 1 | 2017-11-06 05:12:58 |
+----------+-----+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+
1 row in set (0.04 sec)

node1的checksums表的this_crc,this_cnt和master_crc,master_cnt肯定是一样的
因为.这边是replace into select 先计算this_crc,this_cnt .

1
2
REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'sysbench', 'fan', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, CONCAT(ISNULL(`id`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `sysbench`.`fan` /*checksum table*/
SHOW WARNINGS

然后再获取this_crc, this_cnt的值

1
SELECT this_crc, this_cnt FROM `percona`.`checksums` WHERE db = 'sysbench' AND tbl = 'fan' AND chunk = '1'

最后把master_crc,master_cnt更新成这个值

1
UPDATE `percona`.`checksums` SET chunk_time = '0.019798', master_crc = '42981178', master_cnt = '1' WHERE db = 'sysbench' AND tbl = 'fan' AND chunk = '1'

而到了从库这里.由于主库设置了binlog_format=statement 所以replace into select复制过去到从库还是语句,而不是值.此时从库执行这个语句,在自己的checksums表中根据自己的实际值算出了this_crc,this_cnt

1
2
REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'sysbench', 'fan', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, CONCAT(ISNULL(`id`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `sysbench`.`fan` /*checksum table*/
COMMIT /* implicit, from Xid_log_event */

然后拿主库传过来的主库的this_crc, this_cnt的值更新自己的master_crc,master_cnt

1
2
3
BEGIN
UPDATE `percona`.`checksums` SET chunk_time = '0.019798', master_crc = '42981178', master_cnt = '1' WHERE db = 'sysbench' AND tbl = 'fan' AND chunk = '1'
COMMIT /* implicit, from Xid_log_event */

这样,从库的checksums表中就存储了自己和主库的hash值. 最后查一把,把结果返回输出到屏幕

1
2
3
4
5
6
7
8
9
SELECT MAX(chunk) FROM `percona`.`checksums` WHERE db='sysbench' AND tbl='fan' AND master_crc IS NOT NULL
SELECT CONCAT(db, '.', tbl) AS `table`, chunk, chunk_index, lower_boundary, upper_boundary, COALESCE(this_cnt-master_cnt, 0) AS cnt_diff, COALESCE(this_crc <> master_crc OR ISNULL(master_crc) <> ISNULL(this_crc), 0) AS crc_diff, this_cnt, master_cnt, this_crc, master_crc FROM `percona`.`checksums` WHERE (master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) AND (db='sysbench' AND tbl='fan')
这个结果为手动查询的. pt-table-checksum根据这个结果自己整理好格式输出到屏幕
+--------------+-------+-------------+----------------+----------------+----------+----------+----------+------------+----------+------------+
| table | chunk | chunk_index | lower_boundary | upper_boundary | cnt_diff | crc_diff | this_cnt | master_cnt | this_crc | master_crc |
+--------------+-------+-------------+----------------+----------------+----------+----------+----------+------------+----------+------------+
| sysbench.fan | 1 | NULL | NULL | NULL | 0 | 1 | 1 | 1 | 40deaf21 | 42981178 |
+--------------+-------+-------------+----------------+----------------+----------+----------+----------+------------+----------+------------+

关于两边查询一致性的保证

pt-table-checksum 会设置

1
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ

这样避免幻读,通过gap lock阻塞其他insert.保证肯定是replace into select先commit;这样肯定是replace into select先复制到从库(binlog是谁先提交记录谁)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
node1> select * from fan;
+----+------+
| id | name |
+----+------+
| 1 | fan |
| 5 | fan |
| 7 | fan |
| 9 | fan |
| 10 | fan |
| 11 | fan |
| 12 | fan |
| 13 | fan |
| 14 | fan |
| 15 | fan |
+----+------+
10 rows in set (0.00 sec)
node1> select * from fan2;
+----+------+
| id | name |
+----+------+
| 1 | duzi |
| 5 | duzi |
| 7 | duzi |
| 9 | duzi |
| 10 | duzi |
| 11 | duzi |
| 12 | duzi |
| 13 | duzi |
| 14 | duzi |
| 15 | duzi |
+----+------+
10 rows in set (0.00 sec)
node1> SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
Query OK, 0 rows affected (0.00 sec)
node1> begin;
Query OK, 0 rows affected (0.00 sec)
node1> replace into fan2 select * from fan where id<7;
Query OK, 4 rows affected, 1 warning (0.01 sec)
Records: 2 Duplicates: 2 Warnings: 1

node2被锁

1
2
3
4
5
6
7
8
node2> SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
Query OK, 0 rows affected (0.00 sec)
node2> begin;
Query OK, 0 rows affected (0.00 sec)
node2> insert into fan values(4,'ceshi');
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

show engine innodb status

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
------------
TRANSACTIONS
------------
Trx id counter 163F4
Purge done for trx's n:o < 163F0 undo n:o < 0
History list length 662
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0, not started
MySQL thread id 46, OS thread handle 0x7f42f81bf700, query id 62676 localhost root
show engine innodb status
---TRANSACTION 163F3, ACTIVE 5 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 376, 1 row lock(s)
MySQL thread id 45, OS thread handle 0x7f42b44cd700, query id 62675 localhost root update
insert into fan values(4,'ceshi')
------- TRX HAS BEEN WAITING 5 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 49 page no 3 n bits 80 index `PRIMARY` of table `sysbench`.`fan` trx id 163F3 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 4; hex 80000005; asc ;;
1: len 6; hex 0000000163de; asc c ;;
2: len 7; hex ac00000cc70110; asc ;;
3: len 3; hex 66616e; asc fan;;

1
2
3
4
5
6
7
mysql> select * from INNODB_TRX ;
+--------+-----------+---------------------+-----------------------+---------------------+------------+---------------------+-----------------------------------+---------------------+-------------------+-------------------+------------------+-----------------------+-----------------+-------------------+-------------------------+---------------------+-------------------+------------------------+----------------------------+---------------------------+---------------------------+
| trx_id | trx_state | trx_started | trx_requested_lock_id | trx_wait_started | trx_weight | trx_mysql_thread_id | trx_query | trx_operation_state | trx_tables_in_use | trx_tables_locked | trx_lock_structs | trx_lock_memory_bytes | trx_rows_locked | trx_rows_modified | trx_concurrency_tickets | trx_isolation_level | trx_unique_checks | trx_foreign_key_checks | trx_last_foreign_key_error | trx_adaptive_hash_latched | trx_adaptive_hash_timeout |
+--------+-----------+---------------------+-----------------------+---------------------+------------+---------------------+-----------------------------------+---------------------+-------------------+-------------------+------------------+-----------------------+-----------------+-------------------+-------------------------+---------------------+-------------------+------------------------+----------------------------+---------------------------+---------------------------+
| 163F3 | LOCK WAIT | 2017-11-06 05:56:40 | 163F3:49:3:3 | 2017-11-06 05:56:40 | 2 | 45 | insert into fan values(4,'ceshi') | inserting | 1 | 1 | 2 | 376 | 1 | 0 | 0 | REPEATABLE READ | 1 | 1 | NULL | 0 | 10000 |
| 163F2 | RUNNING | 2017-11-06 05:56:29 | NULL | NULL | 6 | 47 | NULL | NULL | 0 | 0 | 4 | 1248 | 5 | 2 | 0 | REPEATABLE READ | 1 | 1 | NULL | 0 | 10000 |
+--------+-----------+---------------------+-----------------------+---------------------+------------+---------------------+-----------------------------------+---------------------+-------------------+-------------------+------------------+-----------------------+-----------------+-------------------+-------------------------+---------------------+-------------------+------------------------+----------------------------+---------------------------+---------------------------+

Powered by Hexo and Hexo-theme-hiker

Copyright © 2013 - 2018 Fan() All Rights Reserved.

访客数 : | 访问量 :

#