死锁案例2 两个UPDATE死锁

问题描述

2020.05.18 16:12 收到报警

登录主机, 查看error log( 开启了innodb_print_all_deadlocks参数)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
*** (1) TRANSACTION:

TRANSACTION 1261729280, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 3645828, OS thread handle 139681739994880, query id 1522438991 172.16.23.82 yos_rw Searching rows for update
UPDATE f_log SET D_UPDATED_AT = NOW() , I_STATUS = 17 WHERE I_STATUS = 1 AND I_TYPE = 1
2020-05-18T16:11:11.696061+08:00 3056367 [Note] InnoDB: *** (1) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 1208 page no 29439 n bits 160 index PRIMARY of table `yos`.`f_log` trx id 1261729280 lock_mode X locks rec but not gap waiting
2020-05-18T16:11:11.696079+08:00 3056367 [Note] InnoDB: *** (2) TRANSACTION:

TRANSACTION 1261729278, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
10 lock struct(s), heap size 1136, 7 row lock(s), undo log entries 3
MySQL thread id 3056367, OS thread handle 139682403342080, query id 1522438990 172.16.23.82 yos_rw updating
UPDATE f_log SET D_UPDATED_AT='2020-05-18 16:11:11.693998',I_STATUS=3 WHERE (I_FILE_ID='92b52d6589ce11a9a3c985e7c5f37462efb66ac7')
2020-05-18T16:11:11.696099+08:00 3056367 [Note] InnoDB: *** (2) HOLDS THE LOCK(S):

RECORD LOCKS space id 1208 page no 29439 n bits 160 index PRIMARY of table `yos`.`f_log` trx id 1261729278 lock_mode X locks rec but not gap
2020-05-18T16:11:11.696112+08:00 3056367 [Note] InnoDB: *** (2) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 1208 page no 17 n bits 1120 index idx_status_type of table `yos`.`f_log` trx id 1261729278 lock_mode X locks rec but not gap waiting
2020-05-18T16:11:11.696223+08:00 3056367 [Note] InnoDB: *** WE ROLL BACK TRANSACTION (1)

简单解释一下上面的死锁日志

TRANSACTION 1261729280, MySQL thread id 3645828, 最后一条执行的sql为

1
UPDATE f_log SET D_UPDATED_AT = NOW() , I_STATUS = 17 WHERE I_STATUS = 1 AND I_TYPE = 1

它在等待获取主键上的记录锁

1
RECORD LOCKS index PRIMARY of table `yos`.`f_log` lock_mode X locks rec but not gap waiting

TRANSACTION 1261729278, MySQL thread id 3056367, 最后一条执行的sql为

1
UPDATE f_log SET D_UPDATED_AT='2020-05-18 16:11:11.693998',I_STATUS=3 WHERE (I_FILE_ID='92b52d6589ce11a9a3c985e7c5f37462efb66ac7')

持有主键记录锁

1
RECORD LOCKS index PRIMARY of table `yos`.`f_log` lock_mode X locks rec but not gap

等待在二级索引idx_status_type上加记录锁

1
RECORD LOCKS index idx_status_type of table `yos`.`f_log` lock_mode X locks rec but not gap waiting

表结构

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
root@localhost 16:18:24 [yos]> show create table f_log\G
*************************** 1. row ***************************
Table: f_log
Create Table: CREATE TABLE `f_log` (
`I_ID` bigint(20) NOT NULL AUTO_INCREMENT COMMENT '自增主键',
`I_CHANNEL_ID` bigint(20) NOT NULL COMMENT '通道ID',
`I_FILE_ID` varchar(100) NOT NULL DEFAULT '' COMMENT '文件ID',
`I_TYPE` tinyint(4) NOT NULL COMMENT '日志类型1 同步 2 删除',
`I_STATUS` tinyint(1) NOT NULL COMMENT '同步标志 0 未同步 1 同步中 2 已同步 3',
`D_CREATED_AT` datetime DEFAULT NULL COMMENT '创建时间',
`D_UPDATED_AT` datetime DEFAULT NULL COMMENT '修改时间',
PRIMARY KEY (`I_ID`),
KEY `idx_updatedat_status_type` (`D_UPDATED_AT`,`I_STATUS`,`I_TYPE`),
KEY `idx_status_type` (`I_STATUS`,`I_TYPE`),
KEY `idx_file_id` (`I_FILE_ID`)
) ENGINE=InnoDB AUTO_INCREMENT=2011346 DEFAULT CHARSET=utf8 COMMENT='文件日志表'
1 row in set (0.00 sec)

事务隔离级别为READ-COMMITTED

死锁这块我比较弱, 其实厉害的话根据上面的信息已经可以看出原因了.

排查过程

我当时去翻审计日志, 通过语句中的一些关键信息, 如 `thread id 3645828/3056367’ 时间范围, 找出了如下信息

TRANSACTION (2) 也就是

1
2
3
4
5
TRANSACTION 1261729278, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
10 lock struct(s), heap size 1136, 7 row lock(s), undo log entries 3
MySQL thread id 3056367, OS thread handle 139682403342080, query id 1522438990 172.16.23.82 yos_rw updating
UPDATE f_log SET D_UPDATED_AT='2020-05-18 16:11:11.693998',I_STATUS=3 WHERE (I_FILE_ID='92b52d6589ce11a9a3c985e7c5f37462efb66ac7')

执行语句如下:

1
2
3
4
5
6
7
8
9
START TRANSACTION
SELECT I_FILE_ID,I_FILE_COUNT,I_FILE_STATUS,I_IS_EXPIRED,I_FILE_VALIDDAY,I_FILE_ACCESS,CH_FILE_NAME,CH_FILE_TYPE,I_BUSINESS_ID,D_CREATED_AT,D_UPDATED_AT,I_YOS_VERSION,D_EXPIRED_AT FROM f_file_info WHERE I_FILE_ID = '92b52d6589ce11a9a3c985e7c5f37462efb66ac7' FOR UPDATE
SELECT i_channel_id FROM f_file_storage WHERE i_file_id = '92b52d6589ce11a9a3c985e7c5f37462efb66ac7' AND i_storage_status = 1
UPDATE f_file_info SET i_file_count = 2, d_updated_at = '2020-05-18 16:11:11.687696' WHERE i_file_id = '92b52d6589ce11a9a3c985e7c5f37462efb66ac7'
SELECT I_ID,I_FILE_ID,I_STORAGE_STATUS,I_CHANNEL_ID,D_CREATED_AT,D_UPDATED_AT FROM f_file_storage WHERE i_file_id = '92b52d6589ce11a9a3c985e7c5f37462efb66ac7' AND i_channel_id = 1 FOR UPDATE
UPDATE f_file_storage SET i_storage_status = 1 , d_updated_at = '2020-05-18 16:11:11.690879' WHERE i_file_id = '92b52d6589ce11a9a3c985e7c5f37462efb66ac7' AND i_channel_id = 1
SELECT count(*) FROM f_file_storage WHERE (I_FILE_ID='92b52d6589ce11a9a3c985e7c5f37462efb66ac7' AND I_STORAGE_STATUS=1)
UPDATE f_log SET D_UPDATED_AT='2020-05-18 16:11:11.693998',I_STATUS=3 WHERE (I_FILE_ID='92b52d6589ce11a9a3c985e7c5f37462efb66ac7')
COMMIT

TRANSACTION (1) 也就是

1
2
3
4
5
TRANSACTION 1261729280, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 3645828, OS thread handle 139681739994880, query id 1522438991 172.16.23.82 yos_rw Searching rows for update
UPDATE f_log SET D_UPDATED_AT = NOW() , I_STATUS = 17 WHERE I_STATUS = 1 AND I_TYPE = 1

执行语句如下:

1
2
Prepare
UPDATE f_log SET D_UPDATED_AT = NOW() , I_STATUS = 17 WHERE I_STATUS = 1 AND I_TYPE = 1

到这里想不出原因了, 因为两个会话实际上就是update语句死锁了. 我但是没想明白为啥UPDATE f_log SET D_UPDATED_AT='2020-05-18 16:11:11.693998',I_STATUS=3 WHERE (I_FILE_ID='92b52d6589ce11a9a3c985e7c5f37462efb66ac7') 要请求在二级索引idx_status_type上加记录锁

后来还是和研发讨论, 研发提醒了我update要锁被修改列二级索引. 我一开始还说不对… 后来想想有道理

https://www.aneasystone.com/archives/2017/12/solving-dead-locks-three.html

那么到这里就说通了, 这就是为啥UPDATE f_log SET D_UPDATED_AT='2020-05-18 16:11:11.693998',I_STATUS=3 WHERE (I_FILE_ID='92b52d6589ce11a9a3c985e7c5f37462efb66ac7') 要请求在二级索引idx_status_type上加记录锁

那么实际这个死锁是这样产生的

事务1,2 开始时 I_FILE_ID=’92b52d6589ce11a9a3c985e7c5f37462efb66ac7’ 对应记录的 I_STATUS = 1 , I_TYPE = 1

事务2执行语句

1
UPDATE f_log SET D_UPDATED_AT='2020-05-18 16:11:11.693998',I_STATUS=3 WHERE (I_FILE_ID='92b52d6589ce11a9a3c985e7c5f37462efb66ac7')

I_FILE_ID列二级索引idx_file_id, 根据二级索引中的主键值找到主键索引对应记录加记录锁, 因为是update语句更新了I_STATUS列, 且I_STATUS列有二级索引idx_status_type所以还要去修改二级索引中对应记录, 于是也要锁idx_status_type中的记录I_STATUS = 1 , I_TYPE = 1, I_ID=2010712(原值)和I_STATUS = 3 , I_TYPE = 1, I_ID=2010712(新值).

但此时事务2还未在idx_status_type对应记录加上锁, 事务1执行了语句

1
UPDATE f_log SET D_UPDATED_AT = NOW() , I_STATUS = 17 WHERE I_STATUS = 1 AND I_TYPE = 1

这个sql走二级索引idx_status_type, 于是先对二级索引idx_status_type对应记录I_STATUS = 1 , I_TYPE = 1, I_ID=2010712加锁, 加锁成功, 然后需要根据二级索引idx_status_type中记录的主键值I_ID=2010712去主键加锁, 但此锁已经被事务2持有

事务2持有主键锁, 等待在二级索引idx_status_type加记录锁

事务1持有idx_status_type记录锁, 等待在主键加记录锁

至此陷入死锁.

问题复现

这个死锁可以说是很巧合了, 手工无法模拟

我是这样的模拟的, 开三个窗口, 分别执行

1
while true;do mysql -uroot -p'123.com' -S /data/mysql_3306/run/mysql.sock fanboshi -e"begin;UPDATE f_log_2 SET D_UPDATED_AT='2020-05-18 16:11:11.693998',I_STATUS=3 WHERE (I_FILE_ID='250181420422912');commit;";done
1
while true;do mysql -uroot -p'123.com' -S /data/mysql_3306/run/mysql.sock fanboshi -e"UPDATE f_log_2 SET D_UPDATED_AT = NOW() , I_STATUS = 17 WHERE I_STATUS = 1 AND I_TYPE = 1";done
1
while true;do mysql -uroot -p'123.com' -S /data/mysql_3306/run/mysql.sock fanboshi -e"UPDATE f_log_2 SET I_STATUS = 1, I_TYPE=1 WHERE I_ID=9656 ";done

第三个窗口目的只是将I_STATUSI_TYPE列值更新回去, 制造条件, 不然前两个窗口语句不满足条件更新不到记录

复现出的死锁日志如下, 开起了innodb_show_verbose_locks参数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
2020-05-19T14:15:29.074829+08:00 11644 [Note] InnoDB: Transactions deadlock detected, dumping detailed information.
2020-05-19T14:15:29.074841+08:00 11644 [Note] InnoDB:
*** (1) TRANSACTION:

TRANSACTION 19924, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 11643, OS thread handle 140502089639680, query id 40057 localhost root Searching rows for update
UPDATE f_log_2 SET D_UPDATED_AT = NOW() , I_STATUS = 17 WHERE I_STATUS = 1 AND I_TYPE = 1
2020-05-19T14:15:29.074858+08:00 11644 [Note] InnoDB: *** (1) WAITING FOR THIS LOCK TO BE GRANTED: --等待获取如下锁

RECORD LOCKS space id 137 page no 136 n bits 312 index PRIMARY of table `fanboshi`.`f_log_2` trx id 19924 lock_mode X locks rec but not gap waiting --主键记录锁
Record lock, heap no 217 PHYSICAL RECORD: n_fields 9; compact format; info bits 0
0: len 8; hex 80000000000025b8; asc % ;; 第一列, 根据表结构知道这列是主键, 16进制转10进制 9656
1: len 6; hex 000000004dd5; asc M ;; --该字段为6个字节的事务id,这个id表示最近一次被更新的事务id 000000004dd5 16进制转10进制结果为19925正是下面事务2的的事务号"TRANSACTION 19925"
2: len 7; hex 34000000051016; asc 4 ;; --该字段为7个字节的回滚指针,用于mvcc
3: len 8; hex 8000000000000001; asc ;; I_CHANNEL_ID = 1
4: len 15; hex 323530313831343230343232393132; asc 250181420422912;; I_FILE_ID 这一列是varchar类型. 0x32 0x35 0x30 0x31 0x38 0x31 0x34 0x32 0x30 0x34 0x32 0x32 0x39 0x31 0x32 转过来就是250181420422912
5: len 1; hex 81; asc ;; I_TYPE=1
6: len 1; hex 83; asc ;; I_STATUS=3
7: len 5; hex 99a2d8fd14; asc ;; D_CREATED_AT 不知道怎么转换
8: len 5; hex 99a66502cc; asc e ;; D_UPDATED_AT 不知道怎么转换
到这里已经能定位是哪一行了
mysql> select * from f_log_2 WHERE I_ID=9656;
+------+--------------+-----------------+--------+----------+---------------------+---------------------+
| I_ID | I_CHANNEL_ID | I_FILE_ID | I_TYPE | I_STATUS | D_CREATED_AT | D_UPDATED_AT |
+------+--------------+-----------------+--------+----------+---------------------+---------------------+
| 9656 | 1 | 250181420422912 | 1 | 1 | 2019-04-12 15:52:20 | 2020-05-18 16:11:12 |
+------+--------------+-----------------+--------+----------+---------------------+---------------------+
1 row in set (0.00 sec)

2020-05-19T14:15:29.074996+08:00 11644 [Note] InnoDB: *** (2) TRANSACTION:

TRANSACTION 19925, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
4 lock struct(s), heap size 1136, 3 row lock(s), undo log entries 1
MySQL thread id 11644, OS thread handle 140502025062144, query id 40060 localhost root updating
UPDATE f_log_2 SET D_UPDATED_AT='2020-05-18 16:11:11.693998',I_STATUS=3 WHERE (I_FILE_ID='250181420422912')
2020-05-19T14:15:29.075010+08:00 11644 [Note] InnoDB: *** (2) HOLDS THE LOCK(S):

RECORD LOCKS space id 137 page no 136 n bits 312 index PRIMARY of table `fanboshi`.`f_log_2` trx id 19925 lock_mode X locks rec but not gap --持有主键记录锁
Record lock, heap no 217 PHYSICAL RECORD: n_fields 9; compact format; info bits 0
0: len 8; hex 80000000000025b8; asc % ;;
1: len 6; hex 000000004dd5; asc M ;;
2: len 7; hex 34000000051016; asc 4 ;;
3: len 8; hex 8000000000000001; asc ;;
4: len 15; hex 323530313831343230343232393132; asc 250181420422912;;
5: len 1; hex 81; asc ;;
6: len 1; hex 83; asc ;;
7: len 5; hex 99a2d8fd14; asc ;;
8: len 5; hex 99a66502cc; asc e ;;
持有TRANSACTION (1)等待获取的哪一行主键锁

2020-05-19T14:15:29.075141+08:00 11644 [Note] InnoDB: *** (2) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 137 page no 17 n bits 1120 index idx_status_type of table `fanboshi`.`f_log_2` trx id 19925 lock_mode X locks rec but not gap waiting --等待二级索引列加锁
Record lock, heap no 526 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
0: len 1; hex 81; asc ;; I_STATUS = 1
1: len 1; hex 81; asc ;; I_TYPE = 1
2: len 8; hex 80000000000025b8; asc % ;; 这是主键值 1

2020-05-19T14:15:29.075197+08:00 11644 [Note] InnoDB: *** WE ROLL BACK TRANSACTION (1)

可以看到和线上出现的死锁日志是吻合的

解决办法

将事务1语句改为先根据条件查询出主键值, 然后根据主键值更新.

Powered by Hexo and Hexo-theme-hiker

Copyright © 2013 - 2022 Fan() All Rights Reserved.

访客数 : | 访问量 :