/* 2. 其中 UNIQUE INDEX 为 uk_account(account_id, type) */
/* 3. 插入数据 */ insertinto t values(1,'1',1,100,1); insertinto t values(2,'2',1,100,1); insertinto t values(3,'3',1,100,1); insertinto t values(4,'4',1,100,1); insertinto t values(5,'5',1,100,1);
/* 4. 查询所有数据. */ select*from t;
/* 5. 执行以下 SQL, 注意事务隔离级别为 (RR) */ t1-1:(session1) begin; /* 显式开启事务, 排除 autocommit 的影响. */ select*from t where account_id ='1'and type =1for update;
t2: (session2) begin; select*from t where account_id ='1'and type =1for update;
t1-2: (session1) update t set state =2where account_id ='1';
我们按照顺序执行分别在 MySQL 8.0.17 和 MySQL 8.0.18 执行,可以看到在 8.0.17 版本事务 t2 因为死锁检测而被视为victim_trx进行了回滚,而 8.0.18 却不会回滚事务 t2.
基于 MySQL 8.0.17
1 2
MySQL [sbtest]>select*from t where account_id ='1'and type =1for update; ERROR 1213 (40001): Deadlock found when trying toget lock; try restarting transaction
分析流程
我们基于问题版本 8.0.17 来分析 Bug 的真正原因.
SQL 分析
通过表结构我们可以看到整个表有两个索引, PRIMARY INDEX 和 UNIQUE INDEX uk_account. 因为是死锁问题, 所以我们要逐条分析 SQL 语句加的 record lock 分别是什么:
t1-1 t1-1 是一条 SELECT FROM UPDATE 的语句, 而account_id和type是一组唯一索引字段, 所以只需要加一个主键索引的 X record lock 和唯一索引 uk_account 的 X record lock.
t2 t2 语句与 t1-1 相同, 加锁一致,也是一个主键索引的 X record lock 和 唯一索引 uk_account 的 X record lock.
>begin; >select*from t where account_id ='1'and type =1for update; > update t set state =2where account_id ='1'; >setglobal innodb_status_output_locks =on; /* 暂不提交,以查看事务锁的持有信息. */ >show engine innodb status\G
/* 事务 t1 持有 5 个 lock, 4 个 row lock, 1 个 table lock (暂时忽略). */ ---TRANSACTION 2068, ACTIVE 16 sec 5 lock struct(s), heap size 1200, 4row lock(s) MySQL thread id 9, OS thread handle 140737025267456, query id 74127.0.0.1 myadmin starting show engine innodb status
/* 下列为 table lock (笔者注). */ TABLE LOCK table `sbtest`.`t` trx id 2068 lock mode IX
/* 下列为 Record: 1,'1',1,100,1 的 X record lock 基于索引 uk_account (笔者注). */ RECORD LOCKS space id 2 page no5 n bits 72 index uk_account oftable `sbtest`.`t` trx id 2068 lock_mode X locks rec but not gap Record lock, heap no2 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 1; hex 31; asc1;; 1: len 1; hex 01; asc ;; 2: len 8; hex 0000000000000001; asc ;;
/* 下列为 Record: 1,'1',1,100,1 的 X record lock 基于主键索引 (笔者注). */ RECORD LOCKS space id 2 page no4 n bits 72 index PRIMARYoftable `sbtest`.`t` trx id 2068 lock_mode X locks rec but not gap Record lock, heap no2 PHYSICAL RECORD: n_fields 7; compact format; info bits 0 0: len 8; hex 0000000000000001; asc ;; 1: len 6; hex 000000000809; asc ;; 2: len 7; hex 01000001160151; asc Q;; 3: len 1; hex 31; asc1;; 4: len 1; hex 01; asc ;; 5: len 8; hex 0000000000000064; asc d;; 6: len 4; hex 00000002; asc ;;
/* 下列为 Record: 1,'1',1,100,1 的 Next-key record lock 基于索引 uk_account. (笔者注). */ RECORD LOCKS space id 2 page no5 n bits 72 index uk_account oftable `sbtest`.`t` trx id 2068 lock_mode X Record lock, heap no2 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 1; hex 31; asc1;; 1: len 1; hex 01; asc ;; 2: len 8; hex 0000000000000001; asc ;;
/* 下列为 Record: 2,'2',1,100,1 的 GAP lock 基于索引 uk_account. (笔者注). */ RECORD LOCKS space id 2 page no5 n bits 72 index uk_account oftable `sbtest`.`t` trx id 2068 lock_mode X locks gap before rec Record lock, heap no3 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 1; hex 32; asc2;; 1: len 1; hex 01; asc ;; 2: len 8; hex 0000000000000002; asc ;;
事务 t1 认为可能会发生的死锁原因是因为在整个 lock 的等待关系中存在一个环, 即 t1 不 commit 提交事务, t2 事务也无法获取 X record lock, 从而导致 t1-2 的 UPDATE 语句也无法获得 X record lock 组成 Next-key record lock, 即使 t1 已经持有了 X record lock.
Bug #23755664 DEADLOCK WITH 3 CONCURRENT DELETES BY UNIQUE KEY
PROBLEM: A deadlock was possible when a transaction tried to “upgrade” an already held Record Lock to Next Key Lock.
SOLUTION: This patch is based on observations that: (1) a Next Key Lock is equivalent to Record Lock combined with Gap Lock (2) a GAP Lock never has to wait for any other lock In case we request a Next Key Lock, we check if we already own a Record Lock of equal or stronger mode, and if so, then we either upgrade it to Next Key Lock, or if it is not possible (because the single lock_t struct is shared by more than one row) we change the requested lock type to GAP Lock, which we either already have, or can be granted immediately. (I don’t consider Insert Intention Locks a Gap Lock in above statements).
这个 Patch 具体的原理是当尝试获取 Next-key record lock 时,不再与旧的逻辑一样,旧的逻辑是先直接尝试申请 Next-key lock, 现在改为先判断当前 trx 是否持有 X record lock, 假如持有就复用这个 X record lock, 从而直接申请 GAP lock, 以达到 Next-key Lock 的效果.
所以在我们上面的例子中,申请 Next-key record lock 时跳过申请 X record lock, 就不会进入等待队列,也不会产生死锁的回环.
/* 使用 8.0.26 最新版分析 lock 持有情况. * 执行 t1: select * from t where account_id = '1' and type =1 for update; update t set state = 2 where account_id = '1'; show engine innodb status\G; */
show engine innodb status TABLE LOCK table `sbtest`.`t` trx id 1323 lock mode IX
/* 下列为 Record: 1,'1',1,100,1 的 X record lock 基于索引 uk_account. (笔者注). */ RECORD LOCKS space id 2 page no 5 n bits 72 index uk_account of table `sbtest`.`t` trx id 1323 lock_mode X locks rec but not gap Record lock, heap no 2 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 1; hex 31; asc 1;; 1: len 1; hex 01; asc ;; 2: len 8; hex 0000000000000001; asc ;;
/* 下列为 Record: 1,'1',1,100,1 的 X record lock 基于主键索引. (笔者注). */ RECORD LOCKS space id 2 page no 4 n bits 72 index PRIMARY of table `sbtest`.`t` trx id 1323 lock_mode X locks rec but not gap Record lock, heap no 2 PHYSICAL RECORD: n_fields 7; compact format; info bits 0 0: len 8; hex 0000000000000001; asc ;; 1: len 6; hex 00000000052b; asc +;; 2: len 7; hex 020000011b0110; asc ;; 3: len 1; hex 31; asc 1;; 4: len 1; hex 01; asc ;; 5: len 8; hex 0000000000000064; asc d;; 6: len 4; hex 00000002; asc ;;
/* 下列为 Record: 1,'1',1,100,1 的 GAP record lock 基于索引 uk_account, 与上面的 uk_account 的 X lock 组成 Next-key Lock. (笔者注). */ RECORD LOCKS space id 2 page no 5 n bits 72 index uk_account of table `sbtest`.`t` trx id 1323 lock_mode X locks gap before rec Record lock, heap no 2 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 1; hex 31; asc 1;; 1: len 1; hex 01; asc ;; 2: len 8; hex 0000000000000001; asc ;;
/* 下列为 Record: 2,'2',1,100,1 的 GAP record lock 基于索引 uk_account. (笔者注: lock 的类型与上列一致, InnoDB 在这里对相同类型的 lock 做了省略). */ Record lock, heap no 3 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 1; hex 32; asc 2;; 1: len 1; hex 01; asc ;; 2: len 8; hex 0000000000000002; asc ;
总结
根据例子我们分析了一个 InnoDB 的死锁场景, 以及 Bug 产生的原因. 通过 gdb 调试的方式分析 InnoDB 的死锁原因,最主要任务就是梳理整个锁的等待依赖关系, 这能帮助我们更直观的分析真正的原因. 这是一个 X record lock “升级” 至 Next-key record lock 的 Bug, 官方在 8.0.18 已经修复了这个存在了几年的问题.