为什么双主只建议单节点写入?

为什么双主只建议单节点写入

通过下面的案例,你应该可以明白为啥了

问题描述

线上一套双主环境

1
2
3
4
5
6
CentOS release 6.8 (Final)
Server version: 5.5.56
binlog_format : STATEMENT
tx_isolation : REPEATABLE-READ
主1 server_id : 32
主2 server_id : 33

有一个表,每分钟load data. 由于一天会插入近1亿行数据,导致磁盘使用率增长很快,所以现在用计划任务每四天切换一次表
1
2
#mobile_ad_50表切换
0 3 2,6,10,14,19,23,27 * * source /etc/profile;source /root/.bash_profile;sh /data/scripts/bin/mobile_ad_50.sh >>/data/scripts/log/mobile_ad_50.log

切换逻辑是,先rename源表,再重建表
1
2
3
4
5
${DB_COMMAND} dbe8je6i4c3gjd50 -ss -e "drop table mobile_ad_50_20170531"
echo "drop ok"
${DB_COMMAND} dbe8je6i4c3gjd50 -ss -e "rename table mobile_ad_50 to mobile_ad_50_20170531"
echo "rename ok"
${DB_COMMAND} dbe8je6i4c3gjd50 -ss -e "CREATE TABLE mobile_ad_50 (

表结构
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
*************************** 1. row ***************************
Table: mobile_ad_50
Create Table: CREATE TABLE `mobile_ad_50` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键',
`dtime` datetime NOT NULL DEFAULT '0000-00-00 00:00:00' COMMENT '时间段 年月日时',
`union_id` int(11) unsigned NOT NULL DEFAULT '1' COMMENT '媒体ID',
`ad_id` varchar(100) DEFAULT NULL COMMENT '广告位ID',
`ifa` varchar(50) NOT NULL COMMENT 'ifa',
`mac` varchar(50) NOT NULL COMMENT 'mac',
`cb_url` varchar(1000) NOT NULL COMMENT '回调地址',
`state` tinyint(1) NOT NULL DEFAULT '0' COMMENT '是否激活',
`domain` varchar(30) NOT NULL COMMENT '游戏域名',
`game_code` varchar(50) NOT NULL DEFAULT '' COMMENT '游戏编码',
`union_app_id` char(50) NOT NULL DEFAULT '' COMMENT '渠道商的appid',
`openudid` char(50) NOT NULL DEFAULT '' COMMENT '开源广告标示符',
`is_send` tinyint(1) NOT NULL DEFAULT '0' COMMENT '同步次数a',
`ip` bigint(20) NOT NULL DEFAULT '0' COMMENT '点击ip',
`actip` bigint(20) NOT NULL DEFAULT '0' COMMENT '激活ip',
`opentime` int(10) unsigned NOT NULL DEFAULT '0' COMMENT '打开时间',
`acttime` int(10) unsigned NOT NULL DEFAULT '0' COMMENT '激活时间',
`is_open` tinyint(1) NOT NULL DEFAULT '0' COMMENT '是否打开',
PRIMARY KEY (`id`),
KEY `ifa` (`ifa`),
KEY `d_u_s` (`domain`,`union_id`,`state`),
KEY `union_id` (`union_id`),
KEY `mac` (`mac`),
KEY `dtime` (`dtime`),
KEY `ip` (`ip`),
KEY `actip` (`actip`),
KEY `union_app_id` (`union_app_id`),
KEY `openudid` (`openudid`),
KEY `state` (`state`),
KEY `acttime` (`acttime`)
) ENGINE=InnoDB AUTO_INCREMENT=6154739813 DEFAULT CHARSET=utf8 COMMENT='手机广告'
1 row in set (0.00 sec)


开发说load只在主1执行,并且这个表数据都是通过load进来的,然后有些update,就再没有其他insert语句了
现在发现问题就是发现auto_increment异常增大,表中有两亿数据时,auto_increment列有51亿.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
mysql> select id,dtime from mobile_ad_50 order by id limit 0,20;
+------------+---------------------+
| id | dtime |
+------------+---------------------+
| 2 | 2017-10-29 03:00:56 |
| 4 | 2017-10-29 03:00:56 |
| 6 | 2017-10-29 03:00:56 |
| 8 | 2017-10-29 03:00:57 |
| 10 | 2017-10-29 03:00:57 |
| 12 | 2017-10-29 03:00:57 |
| 14 | 2017-10-29 03:00:57 |
| 16 | 2017-10-29 03:00:57 |
| 18 | 2017-10-29 03:00:57 |
| 20 | 2017-10-29 03:00:57 |
| 22 | 2017-10-29 03:00:57 |
| 43 | 0000-00-00 00:00:00 |
| 5135418110 | 2017-10-29 03:00:10 |
| 5135418111 | 2017-10-29 03:00:00 |
| 5135418113 | 2017-10-29 03:00:00 |
| 5135418115 | 2017-10-29 03:00:00 |
| 5135418117 | 2017-10-29 03:00:00 |
| 5135418119 | 2017-10-29 03:00:00 |
| 5135418121 | 2017-10-29 03:00:00 |
| 5135418123 | 2017-10-29 03:00:00 |
+------------+---------------------+
20 rows in set (0.00 sec)

看上面的查询是按照主键排序的,id=43一下就涨到51亿
我怀疑是取到了rename之前的表的自增值,查看了一下,还真是
1
2
3
4
5
6
7
mysql> select max(id) from mobile_ad_50_20170531;
+------------+
| max(id) |
+------------+
| 5135418109 |
+------------+
1 row in set (0.00 sec)

自己分析原因一种是插入大量数据后rollback导致自增丢失,但是实际一次load data文件也就几千行,不可能丢失这么多.
第二个可能的原因是innodb_autoinc_lock_mode=1导致的bulk insert时自增丢失,但是我模拟了load 两亿数据,自增id也就2亿1千多万,也不可能丢失到51亿
我观察binlog,找2017-10-29 03:00:10左右的,搜索5135418110,找到
1
2
3
4
5
6
7
8
9
10
11
12
13
BEGIN
/*!*/;
# at 119255971
#171029 3:00:10 server id 33 end_log_pos 119255999 Intvar
SET INSERT_ID=5135418110/*!*/;
# at 119255999
#171029 3:00:10 server id 33 end_log_pos 119256303 Query thread_id=227034786 exec_time=6 error_code=0
SET TIMESTAMP=1509217210/*!*/;
INSERT INTO mobile_ad_50 ( DTIME, UNION_ID, AD_ID, IFA, MAC, CB_URL, STATE, DOMAIN, GAME_CODE, OPENTIME, IS_OPEN ) VALUES ( '2017-10-29 03:00:10', 14800, '', '865166021645612', '', '', 0, '520050', 'android', 1509217210, 1 )
/*!*/;
# at 119256303
#171029 3:00:10 server id 33 end_log_pos 119256330 Xid = 99754915507
COMMIT/*!*/;

这条之前的一个SET INSERT_ID是9000多万
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
BEGIN
/*!*/;
# at 119256931
#171029 3:00:11 server id 33 end_log_pos 119257377 Query thread_id=227034792 exec_time=5 error_code=0
SET TIMESTAMP=1509217211/*!*/;
UPDATE wechat SET ACCESS_TOKEN='oEBoeHxXah3WEPAm_pJbQ-E2dVR2WVTkXn0mQ7YfY20grgt3k29-e518F1OELmHepZumWfxjNuDO7agVyNZZnkfG_xao-yWbfRv90x1ZoN_uQ1ogvsyJazUIVygldMcBBGWdAHAIND', ATOKEN_EXPIRES='2017-10-29 04:58:11', JSAPI_TICKET='kgt8ON7yVITDhtdwci0qeXl3u2D35Jw6KZsyUHYlRNK5VfCPXbMWbtYLkPOWe2hDlrlrly_FyrO3yjhXqhSezg', JSTICKET_EXPIRES='2017-10-29 04:58:11' WHERE id = 20
/*!*/;
SET INSERT_ID=90238492/*!*/;
# at 119254523
#171029 3:00:08 server id 33 end_log_pos 119254827 Query thread_id=227034786 exec_time=8 error_code=0
SET TIMESTAMP=1509217208/*!*/;
INSERT INTO mobile_ad_55 ( DTIME, UNION_ID, AD_ID, IFA, MAC, CB_URL, STATE, DOMAIN, GAME_CODE, OPENTIME, IS_OPEN ) VALUES ( '2017-10-29 03:00:08', 10332, '', '863777021706899', '', '', 0, '520055', 'android', 1
509217208, 1 )
/*!*/;
# at 119254827
#171029 3:00:08 server id 33 end_log_pos 119254854 Xid = 99754915501
COMMIT/*!*/;

开发不是说没有其他的insert语句了吗???怎么这里有一条,看一下server id 33还是主2同步过来的
在分析一下两边binlog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
1
#171029 3:00:15 server id 32 end_log_pos 118195605 Query thread_id=2636976237 exec_time=0 error_code=0
SET TIMESTAMP=1509217215/*!*/;
rename table mobile_ad_50 to mobile_ad_50_20170531


#171029 3:00:10 server id 33 end_log_pos 119255999 Intvar
SET INSERT_ID=5135418110/*!*/;
# at 119255999
#171029 3:00:10 server id 33 end_log_pos 119256303 Query thread_id=227034786 exec_time=6 error_code=0
SET TIMESTAMP=1509217210/*!*/;
INSERT INTO mobile_ad_50 ( DTIME, UNION_ID, AD_ID, IFA, MAC, CB_URL, STATE, DOMAIN, GAME_CODE, OPENTIME, IS_OPE



2
SET TIMESTAMP=1509217215/*!*/;
SET @@session.auto_increment_increment=2, @@session.auto_increment_offset=1/*!*/;
rename table mobile_ad_50 to mobile_ad_50_20170531
/*!*/;

#171029 3:00:10 server id 33 end_log_pos 268758734 Intvar
SET INSERT_ID=5135418110/*!*/;
# at 268758734
#171029 3:00:10 server id 33 end_log_pos 268759038 Query thread_id=227034786 exec_time=0 error_code=0
SET TIMESTAMP=1509217210/*!*/;

问题原因

现在分析一下
3:00:10 主2 SET INSERT_ID=5135418110 insert mobile_ad_50
3:00:15 主1 rename table

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
3.00.1012
SET INSERT_ID=5135418110 insert mobile_ad_50源
<---通过binlog同步给主1---
由于复制延迟,主1这一条insert还没执行
3.00.1512
rename ---通过binlog同步给主2---> rename

这时主1才执行SET INSERT_ID=5135418110 insert mobile_ad_50,主1新创建的mobile_ad_50表"继承了"原来的mobile_ad_50也就是mobile_ad_50_20170531表的ID,导致了ID暴增

证据是5135418110在主1的 mobile_ad_50中,
而主2的mobile_ad_50中没有,
实际上5135418110在主2的mobile_ad_50_20170531中
mysql> select id,dtime from mobile_ad_50 where id=5135418110;
Empty set (0.00 sec)

mysql> select id,dtime from mobile_ad_50_20170531 where id=5135418110;
+------------+---------------------+
| id | dtime |
+------------+---------------------+
| 5135418110 | 2017-10-29 03:00:10 |
+------------+---------------------+

Powered by Hexo and Hexo-theme-hiker

Copyright © 2013 - 2022 Fan() All Rights Reserved.

访客数 : | 访问量 :