MySQL(二十三):查看锁表和等待锁的事务,死锁排查

star2017 1年前 ⋅ 762 阅读

今天碰到个涉及微服务和事务的问题,出现了死锁,抛出了事务等待锁超时的问题。

查了半天最后确定是对微服务环境下的事务,及事务使用的认识上存在不足导致。

死锁

问题

系统是基于微服务架构开发(微服务共用一个数据库),同事 K 负责的 A 服务需要调 B 服务,B 服务要更新表后响应A服务,A 服务收到响应业务处理后会再次更新同一张表,之前是完全没问题的;

同事K 某个修改后导致 B 服务的 update 一直被阻塞,需要等待到 30 多秒后报事务等待锁超时的异常,微服务开发各自负责不同的服务,各自也不会关注其它服务的修改。

排查

  1. 问题是 B 服务抛出的,就得先从 B 服务排查,B 服务就一条简单的 update 语句,也没有开启事务,来回看代码查SQL都确定不了问题,这是代码层面的排查;
  2. 然后就从数据库层面开始查,查看数据库的事务使用情况,发现同一个表有两个事务,其中一个是等待锁的 update 语句事务,另一个占用锁的事务,但没有SQL语句;
  3. 这里问题就跟踪到了同一个表出现 2 个事务的问题,就从请求源头开始查整个请求链路,查看是那个环节开启了事务,但没有执行SQL;
  4. 最后是查到同事 K 的修改在微服务 A 调 B 服务的方法上添加了开启事务。确定了问题的根源后真的想说一万遍 fxxk.............。

原因

深层原因:一个表 2 个事务内部依赖就出现了死锁。

A 服务开启了事务,A 服务调 B 服务等待 B 响应再去更新,B 要先更新后才会响应,因 A 开启了事务占用了锁,B 提交 update 语句就一直等待锁,就出现了 A 等待 B 响应后处理才能释放锁,B 要等待 A 释放锁执行 update 语句完后才能响应,就出现了死循环。

事务使用

问题是查到了,解决也简单,去掉开启事务加个业务判断就行。

但也反应出另一个问题,就是对事务的使用,微服务环境下使用事务的认识不足,现总结以下几点。

  • 微服务环境下,只能开启本地事务,服务之间调用一定不能开启事务(指的不是分布式事务)。排查这类问题会被骂的很惨的。
    特别要注意微服务架构共用同一个库,同一个请求链路到不同服务可能操作同一个表可能导致死锁的问题。

  • 微服务调用之间的数据回滚完全可以在业务层面判断处理,不应依赖于事务。

  • 开启事务,通常是在执行多表操作时使用;

    而不是用在同一张表上的多次操作,这完全可以在业务层面处理。

查看锁和事务

查看是否锁表

mysql> SHOW OPEN TABLES where In_use > 0;

mysql> SHOW STATUS LIKE 'Table_locks%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Table_locks_immediate | 415   | 
| Table_locks_waited    | 0     |
+-----------------------+-------+

-- Table_locks_immediate:表被锁了总数
-- Table_locks_waited:表示有多少请求等待表锁

查看连接进程

-- 查看连接的进程,id 就是进程号
mysql> SHOW PROCESSLIST;
-- 杀掉进程号
mysql> kill id;

查看InnoDB事务

mysql> SELECT * from information_schema.INNODB_TRX
-- 杀掉事务进程
mysql> kill trx_mysql_thread_id

查看InnoDB引擎状态

mysql> SHOW ENGINE INNODB STATUS;

要查看 InnoDB 用户事务中的最近的死锁日志,可使用 SHOW ENGINE INNODB STATUS

返回结果中包括死锁相关事务的详细信息,如引发死锁的SQL,事务已经获得的锁,现在等待什么锁,以及被回滚的事务等等。

  • TRANSACTION:开启事务的表的数量,锁的数量。
  • HOLDS THE LOCK:持有所有的事务
  • WAITING FOR THIS LOCK TO BE GRANTED:等待锁被授权的事务
  • RECORD LOCKS space id 2315:以这个开头的内容可以看出 那个表的哪个索引被锁住了
| InnoDB |      |
=====================================
2020-09-18 13:47:30 0x7f3e75752700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 33 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 309525 srv_active, 0 srv_shutdown, 538473 srv_idle
srv_master_thread log flush and writes: 0
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 444709
OS WAIT ARRAY INFO: signal count 1050283
RW-shared spins 3242974, rounds 3281034, OS waits 45284
RW-excl spins 287322, rounds 5030091, OS waits 38307
RW-sx spins 5729, rounds 49635, OS waits 440
Spin rounds per wait: 1.01 RW-shared, 17.51 RW-excl, 8.66 RW-sx
------------------------
LATEST DETECTED DEADLOCK
------------------------
2020-09-14 09:34:29 0x7f3fc24fd700
*** (1) TRANSACTION:
TRANSACTION 57745114, ACTIVE 2 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 35863, OS thread handle 139910653540096, query id 410554489 convtest.clear-sz.com 192.168.35.180 root updating
UPDATE xxl_job_registry
        SET `update_time` = '2020-09-14 09:34:27.004'
        WHERE `registry_group` = 'EXECUTOR'
          AND `registry_key` = 'pay-notify-executor'
          AND `registry_value` = 'http://192.168.35.192:9999/'

*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 2315 page no 5 n bits 72 index i_g_k_v of table `xxl_job`.`xxl_job_registry` trx id 57745114 lock_mode X
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 8; hex 4558454355544f52; asc EXECUTOR;;
 1: len 19; hex 7061792d6e6f746966792d6578656375746f72; asc pay-notify-executor;;
 2: len 27; hex 687474703a2f2f3139322e3136382e33352e3139323a393939392f; asc http://192.168.35.192:9999/;;
 3: len 4; hex 8000012c; asc    ,;;


*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2315 page no 4 n bits 72 index PRIMARY of table `xxl_job`.`xxl_job_registry` trx id 57745114 lock_mode X locks rec but not gap waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 7; compact format; info bits 32
 0: len 4; hex 8000012c; asc    ,;;
 1: len 6; hex 000003711ed4; asc    q  ;;
 2: len 7; hex 02000000e40d24; asc       $;;
 3: len 8; hex 4558454355544f52; asc EXECUTOR;;
 4: len 19; hex 7061792d6e6f746966792d6578656375746f72; asc pay-notify-executor;;
 5: len 27; hex 687474703a2f2f3139322e3136382e33352e3139323a393939392f; asc http://192.168.35.192:9999/;;
 6: len 5; hex 99a75c9876; asc   \ v;;


*** (2) TRANSACTION:
TRANSACTION 57745108, ACTIVE 40 sec updating or deleting
mysql tables in use 1, locked 1
LOCK WAIT 5 lock struct(s), heap size 1136, 4 row lock(s), undo log entries 2
MySQL thread id 35850, OS thread handle 139910826317568, query id 410505224 convtest.clear-sz.com 192.168.35.180 root updating
DELETE FROM xxl_job_registry
        WHERE id in
         (
            299
         ,
            300
         ,
            301
         )

*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 2315 page no 4 n bits 72 index PRIMARY of table `xxl_job`.`xxl_job_registry` trx id 57745108 lock_mode X locks rec but not gap
Record lock, heap no 3 PHYSICAL RECORD: n_fields 7; compact format; info bits 32
 0: len 4; hex 8000012c; asc    ,;;
 1: len 6; hex 000003711ed4; asc    q  ;;
 2: len 7; hex 02000000e40d24; asc       $;;
 3: len 8; hex 4558454355544f52; asc EXECUTOR;;
 4: len 19; hex 7061792d6e6f746966792d6578656375746f72; asc pay-notify-executor;;
 5: len 27; hex 687474703a2f2f3139322e3136382e33352e3139323a393939392f; asc http://192.168.35.192:9999/;;
 6: len 5; hex 99a75c9876; asc   \ v;;


*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2315 page no 5 n bits 72 index i_g_k_v of table `xxl_job`.`xxl_job_registry` trx id 57745108 lock_mode X locks rec but not gap waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 8; hex 4558454355544f52; asc EXECUTOR;;
 1: len 19; hex 7061792d6e6f746966792d6578656375746f72; asc pay-notify-executor;;
 2: len 27; hex 687474703a2f2f3139322e3136382e33352e3139323a393939392f; asc http://192.168.35.192:9999/;;
 3: len 4; hex 8000012c; asc    ,;;

*** WE ROLL BACK TRANSACTION (1)
------------
TRANSACTIONS
------------
Trx id counter 60548102
Purge done for trx's n:o < 60548102 undo n:o < 0 state: running but idle
History list length 4
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421387217563960, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421387217579656, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421387217578784, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421387217577912, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421387217570936, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421387217575296, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421387217574424, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421387217573552, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421387217565704, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421387217567448, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421387217561344, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421387217572680, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421387217571808, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421387217564832, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421387217563088, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421387217562216, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421387217588376, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421387217582272, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421387217570064, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421387217577040, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421387217569192, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421387217560472, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0] ,
 ibuf aio reads:, log i/o's:, sync i/o's:
Pending flushes (fsync) log: 0; buffer pool: 4
808817809 OS file reads, 15547109 OS file writes, 8554804 OS fsyncs
1588.59 reads/s, 16384 avg bytes/read, 24.48 writes/s, 13.48 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 3089, seg size 3091, 151967 merges
merged operations:
 insert 152264, delete mark 45647, delete 1
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 134 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
2274.81 hash searches/s, 569.50 non-hash searches/s
---
LOG
---
Log sequence number          52194980033
Log buffer assigned up to    52194980033
Log buffer completed up to   52194980033
Log written up to            52194980033
Log flushed up to            52194980033
Added dirty pages up to      52194980033
Pages flushed up to          52194977125
Last checkpoint at           52194977125
8211520 log i/o's done, 12.47 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 137363456
Dictionary memory allocated 3068565
Buffer pool size   8192
Free buffers       0
Database pages     8051
Old database pages 2952
Modified db pages  10
Pending reads      0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 417285319, not young 3776215521
0.00 youngs/s, 0.00 non-youngs/s
Pages read 808817506, created 28900, written 6307328
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
Buffer pool hit rate 890 / 1000, young-making rate 33 / 1000 not 601 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 8051, unzip_LRU len: 0
I/O sum[17887]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
Process ID=18344, Main thread ID=139911724472064 , state=sleeping
Number of rows inserted 2046315, updated 365811, deleted 1948682, read 24113013375
3.61 inserts/s, 0.70 updates/s, 3.39 deletes/s, 45594.89 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

相关参考

  1. MySQL系列(十):MySQL InnoDB 锁机制
  2. How to Find Out Who Is Locking a Table in MySQL
  3. Mysql Insert 锁机制
更多内容请访问:IT源点

相关文章推荐

全部评论: 0

    我有话说: