MHA Tutorial

2017-07-19 阅读量

MHA Tutorial

简单故障转移

构建复制环境

MHA不会帮你搭建复制环境,所以你需要自己搭建复制.换句话说,你可以在已有环境中使用MHA.举个例子,假设有四台主机:host1,host2,host3,host4.主库运行在host1,两个从库分别运行在host2和host3,而host4负责运行MHA Manager.

在host1-host4安装MHA Node

See Installing MHA Node

在host4安装MHA Manager

See Installing MHA Manager .监控节点需要同时安装MHA Node和MHA Manager

创建配置文件

manager_host$ cat /etc/app1.cnf

[server default]
# mysql user and password
user=root
password=mysqlpass
ssh_user=root
# working directory on the manager
manager_workdir=/var/log/masterha/app1
# working directory on MySQL servers
remote_workdir=/var/log/masterha/app1

[server1]
hostname=host1

[server2]
hostname=host2

[server3]
hostname=host3

无需指定host1为master,MHA会自动探测出谁是master

检查SSH互信

MHA Manager内部通过SSH调用包含在MHA Node包中的程序.MHA Node同样通过SSH(scp)传递差异中继日志到no-latest slave.为了使这些过程不需要手动交互,必须要配置SSH公钥验证.MHA Manager提供了一个简单的脚本”masterha_check_ssh”来验证各个节点是非配置好了SSH互信.

# masterha_check_ssh --conf=/etc/app1.cnf

Sat May 14 14:42:19 2011 - [warn] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat May 14 14:42:19 2011 - [info] Reading application default configurations from /etc/app1.cnf..
Sat May 14 14:42:19 2011 - [info] Reading server configurations from /etc/app1.cnf..
Sat May 14 14:42:19 2011 - [info] Starting SSH connection tests..
Sat May 14 14:42:19 2011 - [debug]  Connecting via SSH from root@host1(192.168.0.1) to root@host2(192.168.0.2)..
Sat May 14 14:42:20 2011 - [debug]   ok.
Sat May 14 14:42:20 2011 - [debug]  Connecting via SSH from root@host1(192.168.0.1) to root@host3(192.168.0.3)..
Sat May 14 14:42:20 2011 - [debug]   ok.
Sat May 14 14:42:21 2011 - [debug]  Connecting via SSH from root@host2(192.168.0.2) to root@host1(192.168.0.1)..
Sat May 14 14:42:21 2011 - [debug]   ok.
Sat May 14 14:42:21 2011 - [debug]  Connecting via SSH from root@host2(192.168.0.2) to root@host3(192.168.0.3)..
Sat May 14 14:42:21 2011 - [debug]   ok.
Sat May 14 14:42:22 2011 - [debug]  Connecting via SSH from root@host3(192.168.0.3) to root@host1(192.168.0.1)..
Sat May 14 14:42:22 2011 - [debug]   ok.
Sat May 14 14:42:22 2011 - [debug]  Connecting via SSH from root@host3(192.168.0.3) to root@host2(192.168.0.2)..
Sat May 14 14:42:22 2011 - [debug]   ok.
Sat May 14 14:42:22 2011 - [info] All SSH connection tests passed successfully.

检查复制配置

为了使MHA可以工作,所有在配置文件中定义的主从需要正常运行.MHA Manager提供了一个命令masterha_check_repl来快速检查复制运行状况。

1
2
3

manager_host$ masterha_check_repl --conf=/etc/app1.cnf
...
MySQL Replication Health is OK.

如果您在此处遇到任何错误，请检查日志并解决问题。当前主库不能是从库(意思是不能使其他mysql的从库)，所有其他从库必须从主库复制。 TypicalErrors页面可能有助于修复设置错误。

启动Manager

1	nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/masterha/app1/log/manager.log 2>&1 &

如果所有配置都有效,masterha_manager会持续检查MySQL主库的可用性,知道主库宕机.默认masterha_manager会打印错误到标准输出,但是可以在配置文件中通过manager_log配置参数更改位置.典型的masterha_manager错误是MySQL复制配置无效,ssh_user没有足够的权限(最小要求是有读取relay log权限和写remote_workdir权限).默认masterha_manager在前台运行.如果发送SIGINT(Ctrl+C)到masterha_manager,masterha_manager将停止监控并退出

检查Manager状态

MHA Manager开始监控MySQL master后不会打印任何日志信息知道主库无法访问或者Manager本身被终止.可以通过masterha_check_status命令检查MHA Manager运行状态.

1 2	manager_host$ masterha_check_status --conf=/etc/app1.cnf app1 (pid:5057) is running(0:PING_OK), master:host1

“app1”是MHA内部处理的应用程序名称，它是配置文件的前缀名称。

如果manager被停止或配置文件无效，将返回以下错误。

1 2	manager_host$ masterha_check_status --conf=/etc/app1.cnf app1 is stopped(1:NOT_RUNNING).

停止Manager

你可以通过masterha_stop停止MHA Manager

1 2	manager_host$ masterha_stop --conf=/etc/app1.cnf Stopped app1 successfully.

测试master failover

Now MHA Manager monitors MySQL master server availability. Next, let’s test that master failover works correctly. To simulate this, you can simply kill mysqld on the master.

1 2	host1$ killall -9 mysqld mysqld_safe

On some distributions like Ubuntu, mysqld will be automatically restarted by angel process. If mysqld restarts very quickly (a few seconds), pings from MHA will succeed again before MHA starts failover. In such cases, failover does not start. If restarting mysqld takes long time (i.e. taking 2 minutes for InnoDB crash recovery), failover will start.

If you have difficulties for testing killing mysqld or if you want to test Linux kernel side problem, invoking kernel panic is easy.

1 2	host1# echo c > /proc/sysrq-trigger

Check logs on MHA manager, and verify that host2 becomes new mater, and host3 replicates from host2.

When failover completes (or ends with errors), MHA Manager process stops. This is an expected behavior. If you want to run MHA Manager permanently, please read “Running MHA Manager in background” section.