今天朋友丢来一个SQL,叫我帮忙优化一下.受过落总真传,我瞄了几眼就知道咋回事了
1 | SELECT ESS.PK_NO, |
我让他先执行一下SQL 看看几秒,他说6秒
OK,再把select 里面那俩自定义函数GET_DEPT_NAME
,GET_GLOBAL_NAME
注释掉查一下, 他说1.5秒
那这个SQL就是慢在这俩自定义函数上呗, 这个查询每返回一行,这函数就要执行一次
那么函数可以改成 标量 , 标量可以改成 letf join.
今天朋友丢来一个SQL,叫我帮忙优化一下.受过落总真传,我瞄了几眼就知道咋回事了
1 | SELECT ESS.PK_NO, |
我让他先执行一下SQL 看看几秒,他说6秒
OK,再把select 里面那俩自定义函数GET_DEPT_NAME
,GET_GLOBAL_NAME
注释掉查一下, 他说1.5秒
那这个SQL就是慢在这俩自定义函数上呗, 这个查询每返回一行,这函数就要执行一次
那么函数可以改成 标量 , 标量可以改成 letf join.
从Grafana v4.0开始增加了Alterting功能 ( PMM 1.0.7 版本时Grafana版本为4.0). 这篇文章将手把手教你如何配置你的告警
在PMM部署完成后,你可以看到如下界面
此时你可能需要对Threads_connected / Threads_running 指标进行监控
点击对应的Graph标题,点击Edit
按下图方式依次点击ALert -> Create Alert 创建告警
不幸的是,当你尝试对A指标创建如下告警时,Grafana提示一个错误“Template variables are not supported in alert queries.”
首先A代表什么可以从Metrics菜单中看到,从图中可以看到对于Threads_connected值的获取表达式中包含了变量$host
, 而$host
是箭头所指的Host下拉菜单传递的
对于使用变量的Mertrics,无法创建Alert
主库建一个表1
2
3
4
5node1> create table fan(id int) engine=innodb;
Query OK, 0 rows affected (0.12 sec)
node1> insert into fan values(1);
Query OK, 1 row affected (0.07 sec)
从库制造不一致1
2
3
4
5
6
7
8
9node2> select * from fan;
+------+
| id |
+------+
| 1 |
+------+
1 row in set (0.00 sec)
node2> update fan set id=2;
两边打开general log,然后1
2
3[root@node1 ~]# pt-table-checksum --nocheck-replication-filters --no-check-binlog-format h=172.16.83.17,u=root,p=mysql,P=3307 --replicate=percona.checksums --recursion-method=dsn=h=172.16.83.21,D=percona,t=dsns --databases=sysbench --tables=fan
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
11-06T05:12:58 0 1 1 1 0 0.344 sysbench.fan
先看两边的checksums表1
2
3
4
5
6
7
8
9
10
11
12
13
14node1> select * from `percona`.`checksums` where tbl='fan';
+----------+-----+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+
| db | tbl | chunk | chunk_time | chunk_index | lower_boundary | upper_boundary | this_crc | this_cnt | master_crc | master_cnt | ts |
+----------+-----+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+
| sysbench | fan | 1 | 0.019798 | NULL | NULL | NULL | 42981178 | 1 | 42981178 | 1 | 2017-11-06 05:12:58 |
+----------+-----+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+
node2> select * from `percona`.`checksums` where tbl='fan';
+----------+-----+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+
| db | tbl | chunk | chunk_time | chunk_index | lower_boundary | upper_boundary | this_crc | this_cnt | master_crc | master_cnt | ts |
+----------+-----+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+
| sysbench | fan | 1 | 0.019798 | NULL | NULL | NULL | 40deaf21 | 1 | 42981178 | 1 | 2017-11-06 05:12:58 |
+----------+-----+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+
1 row in set (0.04 sec)
https://github.com/Fanduzi/pybackup
pybackup源自于对线上备份脚本的改进和对备份情况的监控需求.
原本生产库的备份是通过shell脚本调用mydumper,之后再将备份通过rsync传输到备份机.
想要获取备份状态,时间,rsync传输时间等信息只能通过解析日志.
pybackup由python编写,调用mydumper和rsync,将备份信息存入数据库中,后期可以通过grafana图形化展示和监控备份
目前不支持2.6,仅在2.7.14做过测试
帮助信息1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20[root@iZ23t8cwo3iZ backup_db]# python pybackup.py -h
Usage:
pybackup.py mydumper ARG_WITH_NO_--... (([--no-rsync] [--no-history]) | [--only-backup])
pybackup.py only-rsync [--backup-dir=<DIR>] [--bk-id=<id>] [--log-file=<log>]
pybackup.py -h | --help
pybackup.py --version
Options:
-h --help Show help information.
--version Show version.
--no-rsync Do not use rsync.
--no-history Do not record backup history information.
--only-backup Equal to use both --no-rsync and --no-history.
--only-rsync When you backup complete, but rsync failed, use this option to rsync your backup.
--backup-dir=<DIR> The directory where the backup files need to be rsync are located. [default: ./]
--bk-id=<id> bk-id in table user_backup.
--log-file=<log> log file [default: ./rsync.log]
more help information in:
https://github.com/Fanduzi
1 | pybackup.py mydumper ARG_WITH_NO_--... (([--no-rsync] [--no-history]) | [--only-backup]) |
除了最后三个参数,使用的所有参数和mydumper -h中列出的参数相同. 只不过目前只支持长选项,并且不带’–’
通过下面的案例,你应该可以明白为啥了
线上一套双主环境1
2
3
4
5
6CentOS release 6.8 (Final)
Server version: 5.5.56
binlog_format : STATEMENT
tx_isolation : REPEATABLE-READ
主1 server_id : 32
主2 server_id : 33
有一个表,每分钟load data. 由于一天会插入近1亿行数据,导致磁盘使用率增长很快,所以现在用计划任务每四天切换一次表1
2#mobile_ad_50表切换
0 3 2,6,10,14,19,23,27 * * source /etc/profile;source /root/.bash_profile;sh /data/scripts/bin/mobile_ad_50.sh >>/data/scripts/log/mobile_ad_50.log
切换逻辑是,先rename源表,再重建表1
2
3
4
5${DB_COMMAND} dbe8je6i4c3gjd50 -ss -e "drop table mobile_ad_50_20170531"
echo "drop ok"
${DB_COMMAND} dbe8je6i4c3gjd50 -ss -e "rename table mobile_ad_50 to mobile_ad_50_20170531"
echo "rename ok"
${DB_COMMAND} dbe8je6i4c3gjd50 -ss -e "CREATE TABLE mobile_ad_50 (
表结构1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35*************************** 1. row ***************************
Table: mobile_ad_50
Create Table: CREATE TABLE `mobile_ad_50` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键',
`dtime` datetime NOT NULL DEFAULT '0000-00-00 00:00:00' COMMENT '时间段 年月日时',
`union_id` int(11) unsigned NOT NULL DEFAULT '1' COMMENT '媒体ID',
`ad_id` varchar(100) DEFAULT NULL COMMENT '广告位ID',
`ifa` varchar(50) NOT NULL COMMENT 'ifa',
`mac` varchar(50) NOT NULL COMMENT 'mac',
`cb_url` varchar(1000) NOT NULL COMMENT '回调地址',
`state` tinyint(1) NOT NULL DEFAULT '0' COMMENT '是否激活',
`domain` varchar(30) NOT NULL COMMENT '游戏域名',
`game_code` varchar(50) NOT NULL DEFAULT '' COMMENT '游戏编码',
`union_app_id` char(50) NOT NULL DEFAULT '' COMMENT '渠道商的appid',
`openudid` char(50) NOT NULL DEFAULT '' COMMENT '开源广告标示符',
`is_send` tinyint(1) NOT NULL DEFAULT '0' COMMENT '同步次数a',
`ip` bigint(20) NOT NULL DEFAULT '0' COMMENT '点击ip',
`actip` bigint(20) NOT NULL DEFAULT '0' COMMENT '激活ip',
`opentime` int(10) unsigned NOT NULL DEFAULT '0' COMMENT '打开时间',
`acttime` int(10) unsigned NOT NULL DEFAULT '0' COMMENT '激活时间',
`is_open` tinyint(1) NOT NULL DEFAULT '0' COMMENT '是否打开',
PRIMARY KEY (`id`),
KEY `ifa` (`ifa`),
KEY `d_u_s` (`domain`,`union_id`,`state`),
KEY `union_id` (`union_id`),
KEY `mac` (`mac`),
KEY `dtime` (`dtime`),
KEY `ip` (`ip`),
KEY `actip` (`actip`),
KEY `union_app_id` (`union_app_id`),
KEY `openudid` (`openudid`),
KEY `state` (`state`),
KEY `acttime` (`acttime`)
) ENGINE=InnoDB AUTO_INCREMENT=6154739813 DEFAULT CHARSET=utf8 COMMENT='手机广告'
1 row in set (0.00 sec)
有问题就是如果nohup python binlog_server.py & ,然后kill 这个脚本,能kill掉,但是ps -ef | grep mysqlbinlog 还是在执行h
还有一个小问题就是如果直接以命令行方式运行明文指定密码,那么通过ps -ef | grep mysqlbinlog
会直接看到密码…
密码写到配置文件的话也不太安全,总之,还需要完善
指定不同的dbname,从不同的数据库拉binlog, dbname就是配置文件里的section
1 | GRANT REPLICATION SLAVE ON *.* TO 'binlog_backup'@'106.3.130.255' IDENTIFIED BY 'xxx' |
fw.sh添加1
2#备份binlog
$IPTABLES -A INPUT -p tcp -s 106.3.130.255 --dport 3306 -j ACCEPT
1 | """ |
1 |
|
1 | [root@localhost 120.27.143.36]# less /scripts/binlog_server.cnf |
两种方式
1.通过配置文件1
python /scripts/binlog_server.py --config=/tmp/binlog_server.cnf --dbname=GN_PT_SLAVE1
2.命令行指定
注意backup-dir一定要以’/‘结尾1
python binlog_server.py --user=binlog_backup --password=xxxx --host=xxxx --port=3306 --backup-dir=/data4/binlog_backup/ --log=/data4/binlog_backup/BB.log
在脚本中 创建了/tmp/IP_binlog_server.lock 文件,为了防止重复运行.
如果有需要停止,需要手动kill binlog_server.py 和 mysqlbinlog, 并且删除/tmp/IP_binlog_server.lock 文件,不然下次起不来
看各种日志
monitoring service1
2
3
4
5[root@node4 ~]# ll /var/log/pmm-*
-rw-r--r--. 1 root root 1880 Sep 27 18:02 /var/log/pmm-linux-metrics-42000.log
-rw-r--r--. 1 root root 783 Sep 27 18:02 /var/log/pmm-mysql-metrics-42002.log
-rw-r--r--. 1 root root 7143 Sep 27 18:08 /var/log/pmm-mysql-queries-0.log
docker里的日志1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16[root@node4 log]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e4916410b314 percona/pmm-server:latest "/opt/entrypoint.sh" 2 hours ago Up 31 minutes 0.0.0.0:80->80/tcp, 443/tcp pmm-server
948a9aeb047e percona/pmm-server:latest "/bin/true" 2 hours ago Created pmm-data
[root@node4 log]# docker logs e4916410b314
2017-09-27 08:39:47,175 CRIT Supervisor running as root (no user in config file)
2017-09-27 08:39:47,175 WARN Included extra file "/etc/supervisord.d/pmm.ini" during parsing
Unlinking stale socket /var/run/supervisor/supervisor.sock
2017-09-27 08:39:47,527 INFO RPC interface 'supervisor' initialized
2017-09-27 08:39:47,528 INFO supervisord started with pid 1
2017-09-27 08:39:48,536 INFO spawned: 'mysql' with pid 15
2017-09-27 08:39:48,543 INFO spawned: 'consul' with pid 16
2017-09-27 08:39:48,552 INFO spawned: 'grafana' with pid 17
2017-09-27 08:39:48,563 INFO spawned: 'nginx' with pid 18
2017-09-27 08:39:48,610 INFO spawned: 'cron' with pid 19
2017-09-27 08:39:48,612 INFO spawned: 'qan-api' with pid 20
进去容器看1
2
3
4docker exec -it e4916410b314 /bin/bash
/var/log 下面各种日志
/var/log/grafana/grafana.log
/var/log/prometheus.log
pmm-admin list是yes不代表没问题,check-network看看1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33[root@node4 ~]# pmm-admin check-network
PMM Network Status
Server Address | 172.16.83.103
Client Address | 172.16.83.103
* System Time
NTP Server (0.pool.ntp.org) | 2017-09-27 17:13:58 +0800 CST
PMM Server | 2017-09-27 09:13:58 +0000 GMT
PMM Client | 2017-09-27 17:13:58 +0800 CST
PMM Server Time Drift | OK
PMM Client Time Drift | OK
PMM Client to PMM Server Time Drift | OK
* Connection: Client --> Server
-------------------- -------
SERVER SERVICE STATUS
-------------------- -------
Consul API OK
Prometheus API OK
Query Analytics API OK
Connection duration | 551.133µs
Request duration | 2.467879ms
Full round trip | 3.019012ms
* Connection: Client <-- Server
-------------- ------ -------------------- ------- ---------- ---------
SERVICE TYPE NAME REMOTE ENDPOINT STATUS HTTPS/TLS PASSWORD
-------------- ------ -------------------- ------- ---------- ---------
linux:metrics node4 172.16.83.103:42000 OK YES -
mysql:metrics node4 172.16.83.103:42002 OK YES -
PMM使用grafana进行展示,默认是允许匿名登陆的,也就是说无需填写用户名密码就可以查看,修改仪表盘
但是领导说了,没用户密码就能连上来咋行呢.
1 | [root@localhost ~]# docker ps -a |
查看grafana.ini1
2
3
4
5
6
7
8[root@c74f5be8ed88 opt]# vi /etc/grafana/grafana.ini
找到这里
#################################### Anonymous Auth ##########################
[auth.anonymous]
# enable anonymous access
#enabled = True
把enabled = Ture注释掉,这样既禁止匿名用户登陆了
现在如果重启容器,再打开页面,你会发现自己进不去了..咋办呢
进入容器1
2
3
4
5
6
7
8
9
10
11登录数据库
sqlite3 /var/lib/grafana/grafana.db
修改user表,把admin密码改成admin
update user set password = '59acf18b94d7eb0694c61e60ce44c110c7a683ac6a8f09580d626f90f4a242000746579358d77dd9e570e83fa24faa88a8a6', salt = 'F3FAxVm33R' where login = 'admin'
安全起见,也可以把admin密码改成TdPXP4sg
update user set password='11cf3a1ee21b046b939b5f0cdc9d92ab70ba66e4e53f301fb2456ee7b6a665d8abf0d5b387ae0ec53f5f5fc8e477bfbe073e',salt='AHxOW2Fn34',name='admin',is_admin=1 where login='admin';
创建monitor用户密码mj8toYLB
INSERT INTO "user" VALUES(3,0,'monitor','monitor@papapa.com','monitor','98c8e341360759e957ac43e2543fab4eef420a3521450d03ad79d5a1dd76dee233a9ec11870264c2e4dd7266d1a1f68681c2','erShkEJCWn','Y9TF6hFe
bE','',1,0,
0,'','2017-09-28 10:21:10','2017-09-28 10:21:10',0);
源库这边跑一个sysbench,为了测试之后kill -HUP是不是会有影响1
2
3
4
5
6
7
8
9
10
11sysbench /usr/share/sysbench/oltp_read_write.lua --mysql-user=root --mysql-password=mysql --mysql-port=3306 \
--mysql-socket=/data/mysqldata/3306/mysql.sock --mysql-host=localhost \
--mysql-db=sysbenchtest --tables=10 --table-size=5000000 --threads=30 \
--events=5000000 --report-interval=5 --db-driver=mysql prepare
sysbench \
/usr/share/sysbench/oltp_read_write.lua \
--mysql-user=root --mysql-password=mysql --mysql-port=3306 \
--mysql-socket=/data/mysqldata/3306/mysql.sock --mysql-host=localhost \
--mysql-db=sysbenchtest --tables=10 --table-size=5000000 \
--threads=30 --report-interval=5 --time=7000 --db-driver=mysql run > binlog_off.txt
把忘记密码的库的mysql.user表传到一个知道密码的测试库1
2
3
4
5[root@uz22199 mysql]# scp -p user.* 10.4.3.100:/data/mysqldata/3306/data/fandb/
root@10.4.3.100's password:
user.frm 100% 11KB 10.6KB/s 00:00
user.MYD 100% 736 0.7KB/s 00:00
user.MYI 100% 4096 4.0KB/s 00:00
生成一下insert语句,后面用1
2
3
4[root@test43100 ~]# mysqldump --user=root --password='mysql' fandb user --where="host='localhost' and user='root'" |grep INSERT
mysqldump: [Warning] Using a password on the command line interface can be insecure.
Warning: A partial dump from a server that has GTIDs will by default include the GTIDs of all transactions, even those that changed suppressed parts of the database. If you don't want to restore GTIDs, pass --set-gtid-purged=OFF. To make a complete dump, pass --all-databases --triggers --routines --events.
INSERT INTO `user` VALUES ('localhost','root','Y','Y','Y','Y','Y','Y','Y','Y','Y','Y','Y','Y','Y','Y','Y','Y','Y','Y','Y','Y','Y','Y','Y','Y','Y','Y','Y','Y','Y','','','','',0,0,0,0,'mysql_native_password','*81F5E21E35407D884A6CD4A731AEBFB6AF209E1B','N','2017-08-04 08:12:53',NULL,'N');
找到问题语句的位置,这里是13151
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16#170911 14:42:19 server id 330641 end_log_pos 2508 CRC32 0xf697b7bb Xid = 215
COMMIT/*!*/;
# at 2508
#170911 14:50:56 server id 330641 end_log_pos 2573 CRC32 0x8309dea0 GTID last_committed=9 sequence_number=10
SET @@SESSION.GTID_NEXT= '5c351518-78ec-11e7-8e7a-005056a610c3:1315'/*!*/;
# at 2573
#170911 14:50:56 server id 330641 end_log_pos 2693 CRC32 0xb06b9074 Query thread_id=25 exec_time=1 error_code=0
use `fandb`/*!*/;
SET TIMESTAMP=1505112656/*!*/;
DROP TABLE `users` /* generated by server */
/*!*/;
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
# End of log file
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
那一份备份,恢复出来,启动,reset slave all;然后change master1
2
3
4
5
6
7
8
9
10change master to
master_host='10.4.3.100',
master_user='repl',
master_password='repl',
master_port=3306,
master_auto_position=1;
重点
start slave sql_thread until SQL_BEFORE_GTIDS='5c351518-78ec-11e7-8e7a-005056a610c3:1315';
start slave io_thread;