环境概要

  • 操作系统:Arch Linux
  • 数据库版本:MariaDB Community 10.5.5-2
  • MHA版本:0.58
  • 机器IP
    • MySQL Master: 192.168.5.131
    • MySQL Slave One: 192.168.5.130
    • MySQL Slave Two: 192.168.5.132
    • MySQL MHA Manager: 192.168.5.133

备注:选用MariaDB而非MySQL是因为Arch Linux官方软件包仓库使用MariaDB作为MySQL的默认实现,详见: https://www.archlinux.org/news/mariadb-replaces-mysql-in-repositories/ 。MySQL存在于AUR仓库中,但是由于登录的是root用户,yay`等工具不支持root用户安装AUR软件(详见: https://github.com/Jguer/yay/issues/1026 ),故此处直接使用MariaDB。

安装MariaDB

在MySQL Master和两台机器上分别执行以下命令来完成MariaDB的安装与配置。

安装最新版MariaDB:

1
pacman -S mariadb

安装日志如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
➜  ~ pacman -S mariadb
resolving dependencies...
looking for conflicting packages...

Packages (7) inetutils-1.9.4-8 jemalloc-1:5.2.1-3 libaio-0.3.112-2 lzo-2.10-3 mariadb-clients-10.5.5-2 mariadb-libs-10.5.5-2
mariadb-10.5.5-2

Total Download Size: 36.45 MiB
Total Installed Size: 305.30 MiB

:: Proceed with installation? [Y/n]
:: Retrieving packages...
libaio-0.3.112-2-x86_64 6.3 KiB 0.00 B/s 00:00 [###############################################] 100%
lzo-2.10-3-x86_64 82.5 KiB 851 KiB/s 00:00 [###############################################] 100%
inetutils-1.9.4-8-x86_64 295.7 KiB 1183 KiB/s 00:00 [###############################################] 100%
mariadb-libs-10.5.5-2-x86_64 5.6 MiB 753 KiB/s 00:08 [###############################################] 100%
jemalloc-1:5.2.1-3-x86_64 351.2 KiB 650 KiB/s 00:01 [###############################################] 100%
mariadb-clients-10.5.5-2-x86_64 1952.9 KiB 1172 KiB/s 00:02 [###############################################] 100%
mariadb-10.5.5-2-x86_64 28.2 MiB 2.31 MiB/s 00:12 [###############################################] 100%
(7/7) checking keys in keyring [###############################################] 100%
(7/7) checking package integrity [###############################################] 100%
(7/7) loading package files [###############################################] 100%
(7/7) checking for file conflicts [###############################################] 100%
(7/7) checking available disk space [###############################################] 100%
:: Processing package changes...
(1/7) installing libaio [###############################################] 100%
(2/7) installing lzo [###############################################] 100%
(3/7) installing mariadb-libs [###############################################] 100%
Optional dependencies for mariadb-libs
krb5: for gssapi authentication [installed]
(4/7) installing jemalloc [###############################################] 100%
Optional dependencies for jemalloc
perl: for jeprof [installed]
(5/7) installing mariadb-clients [###############################################] 100%
(6/7) installing inetutils [###############################################] 100%
(7/7) installing mariadb [###############################################] 100%
:: You need to initialize the MariaDB data directory prior to starting
the service. This can be done with mariadb-install-db command, e.g.:
mariadb-install-db --user=mysql --basedir=/usr --datadir=/var/lib/mysql
Optional dependencies for mariadb
curl: for ha_s3 plugin [installed]
galera: for MariaDB cluster with Galera WSREP
mysql-python: for myrocks_hotbackup
perl-dbd-mysql: for mysqlhotcopy, mysql_convert_table_format and mysql_setpermission
:: Running post-transaction hooks...
(1/5) Creating system user accounts...
Creating group mysql with gid 974.
Creating user mysql (MariaDB) with uid 974 and gid 974.
(2/5) Reloading system manager configuration...
(3/5) Creating temporary files...
(4/5) Arming ConditionNeedsUpdate...
(5/5) Updating the info directory file...
➜ ~

接下来根据安装日志中的提示,初始化MariaDB数据库:

1
mariadb-install-db --user=mysql --basedir=/usr --datadir=/var/lib/mysql

执行日志如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
➜  ~ mariadb-install-db --user=mysql --basedir=/usr --datadir=/var/lib/mysql
Installing MariaDB/MySQL system tables in '/var/lib/mysql' ...
OK

To start mysqld at boot time you have to copy
support-files/mysql.server to the right place for your system


Two all-privilege accounts were created.
One is root@localhost, it has no password, but you need to
be system 'root' user to connect. Use, for example, sudo mysql
The second is mysql@localhost, it has no password either, but
you need to be the system 'mysql' user to connect.
After connecting you can set the password, if you would need to be
able to connect as any of these users with a password and without sudo

See the MariaDB Knowledgebase at https://mariadb.com/kb or the
MySQL manual for more instructions.

You can start the MariaDB daemon with:
cd '/usr' ; /usr/bin/mysqld_safe --datadir='/var/lib/mysql'

You can test the MariaDB daemon with mysql-test-run.pl
cd '/usr/mysql-test' ; perl mysql-test-run.pl

Please report any problems at https://mariadb.org/jira

The latest information about MariaDB is available at https://mariadb.org/.
You can find additional information about the MySQL part at:
https://dev.mysql.com
Consider joining MariaDB's strong and vibrant community:
https://mariadb.org/get-involved/

➜ ~

接下来启动MariaDB服务并设置为开机自启:

1
systemctl enable --now mariadb.service

然后通过以下命令进行安全设置:

1
mysql_secure_installation

然后根据提示设置root用户的密码等,此处设置root用户密码为123456

详细执行日志如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
➜  ~ mysql_secure_installation

NOTE: RUNNING ALL PARTS OF THIS SCRIPT IS RECOMMENDED FOR ALL MariaDB
SERVERS IN PRODUCTION USE! PLEASE READ EACH STEP CAREFULLY!

In order to log into MariaDB to secure it, we'll need the current
password for the root user. If you've just installed MariaDB, and
haven't set the root password yet, you should just press enter here.

Enter current password for root (enter for none):
OK, successfully used password, moving on...

Setting the root password or using the unix_socket ensures that nobody
can log into the MariaDB root user without the proper authorisation.

You already have your root account protected, so you can safely answer 'n'.

Switch to unix_socket authentication [Y/n] n
... skipping.

You already have your root account protected, so you can safely answer 'n'.

Change the root password? [Y/n] Y
New password:
Re-enter new password:
Password updated successfully!
Reloading privilege tables..
... Success!


By default, a MariaDB installation has an anonymous user, allowing anyone
to log into MariaDB without having to have a user account created for
them. This is intended only for testing, and to make the installation
go a bit smoother. You should remove them before moving into a
production environment.

Remove anonymous users? [Y/n] Y
... Success!

Normally, root should only be allowed to connect from 'localhost'. This
ensures that someone cannot guess at the root password from the network.

Disallow root login remotely? [Y/n] Y
... Success!

By default, MariaDB comes with a database named 'test' that anyone can
access. This is also intended only for testing, and should be removed
before moving into a production environment.

Remove test database and access to it? [Y/n] Y
- Dropping test database...
... Success!
- Removing privileges on test database...
... Success!

Reloading the privilege tables will ensure that all changes made so far
will take effect immediately.

Reload privilege tables now? [Y/n] Y
... Success!

Cleaning up...

All done! If you've completed all of the above steps, your MariaDB
installation should now be secure.

Thanks for using MariaDB!
➜ ~

至此,MariaDB的安装与配置就已经完成了。

配置主从同步

配置Master

主要的配置步骤分为:

  1. 开启Binary Log
  2. 配置唯一的server_id
  3. 使用log-basename为replication logs指定唯一的名称

修改vim /etc/my.cnf.d/server.cnf配置文件,在[mariadb]节点添加如下配置:

1
2
3
4
5
[mariadb]
log-bin
server_id=1
log-basename=master1
binlog-format=mixed

重启MariaDB使更改的配置文件生效:

1
systemctl restart mariadb.service

至此Master的配置结束。

配置Slave

首先修改两台Slave机器的vim /etc/my.cnf.d/server.cnf配置文件,在[mariadb]节点添加server_id配置,需要保证两台slave和一台master三台机器的service_id都唯一。

此处需要开启两个Slave实例的Binary Log,因为当之后配置了MHA,在Master宕机后,只有配置了Binary Log才能晋升为Master。同时关闭中继日志,否则后续MHA校验的时候会打印警告日志。

开启Binary Log并修改Slave One机器的server_id为2:

1
2
3
4
5
6
[mariadb]
log-bin
server_id=2
log-basename=slave1
binlog-format=mixed
relay_log_purge=0 # 关闭中继日志

开启Binary Log并修改Slave Two机器的server_id为3:

1
2
3
4
5
6
[mariadb]
log-bin
server_id=3
log-basename=slave2
binlog-format=mixed
relay_log_purge=0 # 关闭中继日志

然后分别重启Slave One和Slave Two机器上的MariaDB使配置生效:

1
systemctl restart mariadb.service

获取Master Binary Log坐标

在Master机器上,首先通过mysql -u root -p进入MySQL终端,然后通过以下语句查看Binary Log信息:

1
SHOW MASTER STATUS;

输出如下:

1
2
3
4
5
6
7
MariaDB [(none)]> SHOW MASTER STATUS;
+--------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+--------------------+----------+--------------+------------------+
| master1-bin.000001 | 330 | | |
+--------------------+----------+--------------+------------------+
1 row in set (0.000 sec)

记录FilePosition字段的值,以便稍后使用。

启动Slave

先在Master机器执行mysql -u root -p命令,输入密码进入MySQL终端,然后执行以下SQL语句创建专有的用户并授权:

1
2
3
CREATE USER 'replication_user'@'%' IDENTIFIED BY '123456';
GRANT REPLICATION SLAVE ON *.* TO 'replication_user'@'%';
FLUSH PRIVILEGES;

然后在两台Slave机器上,分别通过mysql -u root -p进入MySQL终端,然后执行以下语句来为Slave指定Master:

1
2
3
4
5
6
7
8
CHANGE MASTER TO
MASTER_HOST='192.168.5.131',
MASTER_USER='replication_user',
MASTER_PASSWORD='123456',
MASTER_PORT=3306,
MASTER_LOG_FILE='master1-bin.000001',
MASTER_LOG_POS=330,
MASTER_CONNECT_RETRY=10;

注意保证Master的IP地址准确,并且Binary Log的文件名和位置与上一步查看到的一致。

接下来使用以下命令启动Slave:

1
START SLAVE;

执行完毕后,使用如下命令查看Slave状态,\G表示将输出结果纵向显示,方便查看 :

1
SHOW SLAVE STATUS \G;

输出结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
MariaDB [(none)]> SHOW SLAVE STATUS \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.5.131
Master_User: replication_user
Master_Port: 3306
Connect_Retry: 10
Master_Log_File: master1-bin.000001
Read_Master_Log_Pos: 330
Relay_Log_File: mysql-slave-two-relay-bin.000002
Relay_Log_Pos: 557
Relay_Master_Log_File: master1-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 330
Relay_Log_Space: 876
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: No
Gtid_IO_Pos:
Replicate_Do_Domain_Ids:
Replicate_Ignore_Domain_Ids:
Parallel_Mode: optimistic
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Slave_DDL_Groups: 0
Slave_Non_Transactional_Groups: 0
Slave_Transactional_Groups: 0
1 row in set (0.001 sec)

可以看到在输出结果中包含了:

1
2
Slave_IO_Running: Yes
Slave_SQL_Running: Yes

表示主从同步已经开启完毕。

配置半同步复制

说明:从MariaDB 10.3.3开始,半同步复制功能已经整合进MariaDB,无需再通过插件形式安装。

Master开启半同步复制

修改Master机器的vim /etc/my.cnf.d/server.cnf配置文件,添加半同步复制相关的开关和超时时间配置:

1
2
3
4
[mariadb]
...
rpl_semi_sync_master_enabled=ON
rpl_semi_sync_master_timeout=5000

重启MariaDB使配置生效:

1
systemctl restart mariadb.service

Slave开启半同步复制

修改两台Slave机器的vim /etc/my.cnf.d/server.cnf配置文件,添加半同步复制相关的开关:

1
2
3
[mariadb]
...
rpl_semi_sync_slave_enabled=ON

重启MariaDB使配置生效:

1
systemctl restart mariadb.service

配置MHA

安装MHA Node

MHA Node需要安装在MySQL的主从机,也需要安装在MHA机器,因为MHA Manager依赖MHA Node

分别登录四台机器,执行以下命令安装Perl的DBD::mysql模块和install模块:

1
pacman -S perl-dbd-mysql perl-module-install

然后分别在四台机器下载MHA Node,并且从源码编译安装:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# 下载MHA Node源码
wget https://github.com/yoshinorim/mha4mysql-node/releases/download/v0.58/mha4mysql-node-0.58.tar.gz
# 解压缩
tar zxvf mha4mysql-node-0.58.tar.gz
# 移除压缩包
rm mha4mysql-node-0.58.tar.gz
# 进入源码目录
cd mha4mysql-node-0.58
# 编译并安装MHA Node
perl Makefile.PL
make
make install
# 清理源码
rm -rf mha4mysql-node-0.58

安装完成后,创建软链接,防止MHA Manager通过SSH登录后无法执行MHA Node相关的命令:

1
2
3
4
5
# 在四台机器分别执行
ln -sf /usr/bin/site_perl/apply_diff_relay_logs /usr/bin/apply_diff_relay_logs
ln -sf /usr/bin/site_perl/filter_mysqlbinlog /usr/bin/filter_mysqlbinlog
ln -sf /usr/bin/site_perl/purge_relay_logs /usr/bin/purge_relay_logs
ln -sf /usr/bin/site_perl/save_binary_logs /usr/bin/save_binary_logs

至此,四台机器上的MHA Node安装完毕。

安装MHA Manager

登录MySQL MHA机器,准备安装MHA ManagerMHA Manager依赖以下模块:

  • MHA Node package
  • DBD::mysql
  • Config::Tiny
  • Log::Dispatch
  • Parallel::ForkManager
  • Time::HiRes

上一步已经安装了MHA NodeDBD::mysql,接下来安装剩余的模块:

1
pacman -S perl-config-tiny perl-parallel-forkmanager

由于Arch Linux官方软件仓库不包含Log::Dispatch,而是在AUR中,此处登录的是root用户,无法安装AUR仓库的内容,所以此处使用perl内置的cpan进行安装:

1
cpan Log::Dispatch

Time::HiRes已经在Perl v5.7.3以后内置,所以此处不用安装。

接下来下载MHA Manager源码并编译安装:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# 下载源码
wget https://github.com/yoshinorim/mha4mysql-manager/releases/download/v0.58/mha4mysql-manager-0.58.tar.gz
# 解压缩
tar zxvf mha4mysql-manager-0.58.tar.gz
# 移除压缩包
rm mha4mysql-manager-0.58.tar.gz
# 进入源码目录
cd mha4mysql-manager-0.58
# 编译并安装MHA Node
perl Makefile.PL
make
make install
# 清理源码
rm -rf mha4mysql-manager-0.58

至此,MHA Manager安装完毕。

配置MHA Manager

首先登录MySQL Master机器,创建MHA Manager用户并授权:

1
2
3
CREATE USER 'mha_user'@'%' IDENTIFIED BY '123456';
GRANT ALL ON *.* TO 'mha_user'@'%';
FLUSH PRIVILEGES;

然后登录MySQL MHA机器,创建MHA Manager配置文件:

1
vim /etc/mha_default_app.cnf

写入以下内容:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
[server default]
user=mha_user
password=123456
repl_user=replication_user
repl_password=123456
ssh_user=root
manager_workdir=/var/log/masterha/default_app
manager_log=/var/log/masterha/default_app/default_app.log
remote_workdir=/var/log/masterha/default_app
ping_interval=3

[server1]
# Master
hostname=192.168.5.131

[server2]
# Slave One
hostname=192.168.5.130

[server3]
# Slave Two
hostname=192.168.5.132

创建日志文件夹:

1
mkdir -p /var/log/masterha/default_app

配置SSH密钥登录,使得MySQL MHA可以登录三台MySQL机器,然后三台MySQL可以互相登录:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
## 首先登录到MySQL MHA机器进行配置
# 生成密钥,所有选项回车确认即可。
ssh-keygen -t rsa -b 4096
# 将公钥发送到三台MySQL机器:
ssh-copy-id -i ~/.ssh/id_rsa.pub 192.168.5.131 # MySQL Master
ssh-copy-id -i ~/.ssh/id_rsa.pub 192.168.5.130 # MySQL Slave One
ssh-copy-id -i ~/.ssh/id_rsa.pub 192.168.5.132 # MySQL Slave Two

## 然后登录到MySQL Master机器进行配置
# 生成密钥,同上。
ssh-keygen -t rsa -b 4096
# 将公钥发送到两台MySQL Slave机器:
ssh-copy-id -i ~/.ssh/id_rsa.pub 192.168.5.130 # MySQL Slave One
ssh-copy-id -i ~/.ssh/id_rsa.pub 192.168.5.132 # MySQL Slave Two


## 然后登录到MySQL Slave One机器进行配置
# 生成密钥,同上。
ssh-keygen -t rsa -b 4096
# 将公钥发送到MySQL Master和MySQL Slave Two机器:
ssh-copy-id -i ~/.ssh/id_rsa.pub 192.168.5.131 # MySQL Master
ssh-copy-id -i ~/.ssh/id_rsa.pub 192.168.5.132 # MySQL Slave Two


## 然后登录到MySQL Slave Two机器进行配置
# 生成密钥,同上。
ssh-keygen -t rsa -b 4096
# 将公钥发送到MySQL Master和MySQL Slave One机器,中途会要求输入这两台机器root用户的密码:
ssh-copy-id -i ~/.ssh/id_rsa.pub 192.168.5.131 # MySQL Master
ssh-copy-id -i ~/.ssh/id_rsa.pub 192.168.5.130 # MySQL Slave One

回到MySQL MHA机器,执行以下命令校验配置是否有误:

1
2
3
4
# 校验SSH,需要出现All SSH connection tests passed successfully代表成功。
masterha_check_ssh --conf=/etc/mha_default_app.cnf
# 校验主从复制,需要出现MySQL Replication Health is OK代表成功。
masterha_check_repl --conf=/etc/mha_default_app.cnf

可能出现的错误

  1. 如果校验的时候遇到以下错误:

    1
    [error][/usr/share/perl5/site_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. Redundant argument in sprintf at /usr/share/perl5/site_perl/MHA/NodeUtil.pm line 201.

    这是因为获取MySQL版本的时候出错,因为Arch Linux安装的MariaDB版本为 10.5.5-2,版本号中有短横线,而原有代码无法处理有短横线的情况。

    可以修改全部四台机器的/usr/share/perl5/site_perl/MHA/NodeUtil.pm,修改的地方如下:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    # 找到 NodeUtil.pm 201行附近的parse_mysql_major_version方法,原方法为:
    sub parse_mysql_major_version($) {
    my $str = shift;
    my $result = sprintf( '%03d%03d', $str =~ m/(\d+)/g );
    return $result;
    }

    # 修改如下,新增了两行,然后下一行sprintf方法内原有的$str更换为刚新增的$strmajor即可。
    sub parse_mysql_major_version($) {
    my $str = shift;
    $str =~ /(\d+)\.(\d+)/;
    my $strmajor = "$1.$2";
    my $result = sprintf( '%03d%03d', $strmajor =~ m/(\d+)/g );
    return $result;
    }
  2. 如果出现以下错误:

    1
    Checking if super_read_only is defined and turned on..DBD::mysql::st execute failed: Unknown system variable 'super_read_only' at /usr/share/perl5/site_perl/MHA/SlaveUtil.pm line 245.

    这是因为MariaDB没有super_read_only变量导致。解决方法是修改全部四台机器的/usr/share/perl5/site_perl/MHA/SlaveUtil.pm文件,在230行左右找到check_if_super_read_only方法,然后删除掉该方法的方法体,直接返回0即可。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    # 修改前
    sub check_if_super_read_only {
    ...
    if ( $href->{Value} == '1' ) {
    return (1, $dbh);
    }
    return (0, 0);
    }

    # 修改后
    sub check_if_super_read_only {
    return (0, 0);
    }

由于这些文件是只读文件,使用vim修改后需要使用:wq!强制保存更改并退出。

如果SSH校验和主从复制校验都成功,则可以启动MHA Manager

1
nohup masterha_manager --conf=/etc/mha_default_app.cnf < /dev/null > /var/log/masterha/default_app/default_app.log 2>&1 &

查看MHA Manager状态:

1
masterha_check_status --conf=/etc/mha_default_app.cnf

输出如下:

1
2
3
➜  ~ masterha_check_status --conf=/etc/mha_default_app.cnf
mha_default_app (pid:544) is running(0:PING_OK), master:192.168.5.131
➜ ~

查看MHA Manager日志:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
➜  ~ tail -n 100 /var/log/masterha/default_app/default_app.log 
Fri Sep 11 10:07:31 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Sep 11 10:07:31 2020 - [info] Reading application default configuration from /etc/mha_default_app.cnf..
Fri Sep 11 10:07:31 2020 - [info] Reading server configuration from /etc/mha_default_app.cnf..
Fri Sep 11 10:07:31 2020 - [info] MHA::MasterMonitor version 0.58.
Fri Sep 11 10:07:32 2020 - [info] GTID failover mode = 0
Fri Sep 11 10:07:32 2020 - [info] Dead Servers:
Fri Sep 11 10:07:32 2020 - [info] Alive Servers:
Fri Sep 11 10:07:32 2020 - [info] 192.168.5.131(192.168.5.131:3306)
Fri Sep 11 10:07:32 2020 - [info] 192.168.5.130(192.168.5.130:3306)
Fri Sep 11 10:07:32 2020 - [info] 192.168.5.132(192.168.5.132:3306)
Fri Sep 11 10:07:32 2020 - [info] Alive Slaves:
Fri Sep 11 10:07:32 2020 - [info] 192.168.5.130(192.168.5.130:3306) Version=10.5.5-MariaDB-log (oldest major version between slaves) log-bin:enabled
Fri Sep 11 10:07:32 2020 - [info] Replicating from 192.168.5.131(192.168.5.131:3306)
Fri Sep 11 10:07:32 2020 - [info] 192.168.5.132(192.168.5.132:3306) Version=10.5.5-MariaDB-log (oldest major version between slaves) log-bin:enabled
Fri Sep 11 10:07:32 2020 - [info] Replicating from 192.168.5.131(192.168.5.131:3306)
Fri Sep 11 10:07:32 2020 - [info] Current Alive Master: 192.168.5.131(192.168.5.131:3306)
Fri Sep 11 10:07:32 2020 - [info] Checking slave configurations..
Fri Sep 11 10:07:32 2020 - [info] read_only=1 is not set on slave 192.168.5.130(192.168.5.130:3306).
Fri Sep 11 10:07:32 2020 - [info] read_only=1 is not set on slave 192.168.5.132(192.168.5.132:3306).
Fri Sep 11 10:07:32 2020 - [info] Checking replication filtering settings..
Fri Sep 11 10:07:32 2020 - [info] binlog_do_db= , binlog_ignore_db=
Fri Sep 11 10:07:32 2020 - [info] Replication filtering check ok.
Fri Sep 11 10:07:32 2020 - [info] GTID (with auto-pos) is not supported
Fri Sep 11 10:07:32 2020 - [info] Starting SSH connection tests..
Fri Sep 11 10:07:35 2020 - [info] All SSH connection tests passed successfully.
Fri Sep 11 10:07:35 2020 - [info] Checking MHA Node version..
Fri Sep 11 10:07:35 2020 - [info] Version check ok.
Fri Sep 11 10:07:35 2020 - [info] Checking SSH publickey authentication settings on the current master..
Fri Sep 11 10:07:35 2020 - [info] HealthCheck: SSH to 192.168.5.131 is reachable.
Fri Sep 11 10:07:36 2020 - [info] Master MHA Node version is 0.58.
Fri Sep 11 10:07:36 2020 - [info] Checking recovery script configurations on 192.168.5.131(192.168.5.131:3306)..
Fri Sep 11 10:07:36 2020 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/var/log/masterha/default_app/save_binary_logs_test --manager_version=0.58 --start_file=master1-bin.000008
Fri Sep 11 10:07:36 2020 - [info] Connecting to [email protected](192.168.5.131:22)..
Creating /var/log/masterha/default_app if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql, up to master1-bin.000008
Fri Sep 11 10:07:36 2020 - [info] Binlog setting check done.
Fri Sep 11 10:07:36 2020 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Fri Sep 11 10:07:36 2020 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha_user' --slave_host=192.168.5.130 --slave_ip=192.168.5.130 --slave_port=3306 --workdir=/var/log/masterha/default_app --target_version=10.5.5-MariaDB-log --manager_version=0.58 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Fri Sep 11 10:07:36 2020 - [info] Connecting to [email protected](192.168.5.130:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to slave2-relay-bin.000008
Temporary relay log file is /var/lib/mysql/slave2-relay-bin.000008
Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
Testing mysql connection and privileges..
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Fri Sep 11 10:07:36 2020 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha_user' --slave_host=192.168.5.132 --slave_ip=192.168.5.132 --slave_port=3306 --workdir=/var/log/masterha/default_app --target_version=10.5.5-MariaDB-log --manager_version=0.58 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Fri Sep 11 10:07:36 2020 - [info] Connecting to [email protected](192.168.5.132:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to slave1-relay-bin.000008
Temporary relay log file is /var/lib/mysql/slave1-relay-bin.000008
Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
Testing mysql connection and privileges..
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Fri Sep 11 10:07:37 2020 - [info] Slaves settings check done.
Fri Sep 11 10:07:37 2020 - [info]
192.168.5.131(192.168.5.131:3306) (current master)
+--192.168.5.130(192.168.5.130:3306)
+--192.168.5.132(192.168.5.132:3306)

Fri Sep 11 10:07:37 2020 - [warning] master_ip_failover_script is not defined.
Fri Sep 11 10:07:37 2020 - [warning] shutdown_script is not defined.
Fri Sep 11 10:07:37 2020 - [info] Set master ping interval 3 seconds.
Fri Sep 11 10:07:37 2020 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Fri Sep 11 10:07:37 2020 - [info] Starting ping health check on 192.168.5.131(192.168.5.131:3306)..
Fri Sep 11 10:07:37 2020 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
➜ ~

接下来可以测试以下Master宕机了是否会切换新的Master。

关闭MySQL Master机器的MariaDB:

1
systemctl stop mariadb.service

此时查看MHA Manager日志,可以看到,在日志的最后,Master从192.168.5.131转移到了192.168.5.130,完成了主库切换:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
Fri Sep 11 11:57:45 2020 - [warning] Got error on MySQL select ping: 1927 (Connection was killed)
Fri Sep 11 11:57:45 2020 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/var/log/masterha/default_app/save_binary_logs_test --manager_version=0.58 --binlog_prefix=master1-bin
Fri Sep 11 11:57:45 2020 - [warning] HealthCheck: SSH to 192.168.5.131 is NOT reachable.
Fri Sep 11 11:57:51 2020 - [warning] Got error on MySQL connect: 2002 (Can't connect to MySQL server on '192.168.5.131' (110))
Fri Sep 11 11:57:51 2020 - [warning] Connection failed 2 time(s)..
Fri Sep 11 11:57:54 2020 - [warning] Got error on MySQL connect: 2002 (Can't connect to MySQL server on '192.168.5.131' (110))
Fri Sep 11 11:57:54 2020 - [warning] Connection failed 3 time(s)..
Fri Sep 11 11:57:57 2020 - [warning] Got error on MySQL connect: 2002 (Can't connect to MySQL server on '192.168.5.131' (110))
Fri Sep 11 11:57:57 2020 - [warning] Connection failed 4 time(s)..
Fri Sep 11 11:57:57 2020 - [warning] Master is not reachable from health checker!
Fri Sep 11 11:57:57 2020 - [warning] Master 192.168.5.131(192.168.5.131:3306) is not reachable!
Fri Sep 11 11:57:57 2020 - [warning] SSH is NOT reachable.
Fri Sep 11 11:57:57 2020 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mha_default_app.cnf again, and trying to connect to all servers to check server status..
Fri Sep 11 11:57:57 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Sep 11 11:57:57 2020 - [info] Reading application default configuration from /etc/mha_default_app.cnf..
Fri Sep 11 11:57:57 2020 - [info] Reading server configuration from /etc/mha_default_app.cnf..
Fri Sep 11 11:57:58 2020 - [info] GTID failover mode = 0
Fri Sep 11 11:57:58 2020 - [info] Dead Servers:
Fri Sep 11 11:57:58 2020 - [info] 192.168.5.131(192.168.5.131:3306)
Fri Sep 11 11:57:58 2020 - [info] Alive Servers:
Fri Sep 11 11:57:58 2020 - [info] 192.168.5.130(192.168.5.130:3306)
Fri Sep 11 11:57:58 2020 - [info] 192.168.5.132(192.168.5.132:3306)
Fri Sep 11 11:57:58 2020 - [info] Alive Slaves:
Fri Sep 11 11:57:58 2020 - [info] 192.168.5.130(192.168.5.130:3306) Version=10.5.5-MariaDB-log (oldest major version between slaves) log-bin:enabled
Fri Sep 11 11:57:58 2020 - [info] Replicating from 192.168.5.131(192.168.5.131:3306)
Fri Sep 11 11:57:58 2020 - [info] 192.168.5.132(192.168.5.132:3306) Version=10.5.5-MariaDB-log (oldest major version between slaves) log-bin:enabled
Fri Sep 11 11:57:58 2020 - [info] Replicating from 192.168.5.131(192.168.5.131:3306)
Fri Sep 11 11:57:58 2020 - [info] Checking slave configurations..
Fri Sep 11 11:57:58 2020 - [info] read_only=1 is not set on slave 192.168.5.130(192.168.5.130:3306).
Fri Sep 11 11:57:58 2020 - [info] read_only=1 is not set on slave 192.168.5.132(192.168.5.132:3306).
Fri Sep 11 11:57:58 2020 - [info] Checking replication filtering settings..
Fri Sep 11 11:57:58 2020 - [info] Replication filtering check ok.
Fri Sep 11 11:57:58 2020 - [info] Master is down!
Fri Sep 11 11:57:58 2020 - [info] Terminating monitoring script.
Fri Sep 11 11:57:58 2020 - [info] Got exit code 20 (Master dead).
Fri Sep 11 11:57:58 2020 - [info] MHA::MasterFailover version 0.58.
Fri Sep 11 11:57:58 2020 - [info] Starting master failover.
Fri Sep 11 11:57:58 2020 - [info]
Fri Sep 11 11:57:58 2020 - [info] * Phase 1: Configuration Check Phase..
Fri Sep 11 11:57:58 2020 - [info]
Fri Sep 11 11:57:59 2020 - [info] GTID failover mode = 0
Fri Sep 11 11:57:59 2020 - [info] Dead Servers:
Fri Sep 11 11:57:59 2020 - [info] 192.168.5.131(192.168.5.131:3306)
Fri Sep 11 11:57:59 2020 - [info] Checking master reachability via MySQL(double check)...
Fri Sep 11 11:58:00 2020 - [info] ok.
Fri Sep 11 11:58:00 2020 - [info] Alive Servers:
Fri Sep 11 11:58:00 2020 - [info] 192.168.5.130(192.168.5.130:3306)
Fri Sep 11 11:58:00 2020 - [info] 192.168.5.132(192.168.5.132:3306)
Fri Sep 11 11:58:00 2020 - [info] Alive Slaves:
Fri Sep 11 11:58:00 2020 - [info] 192.168.5.130(192.168.5.130:3306) Version=10.5.5-MariaDB-log (oldest major version between slaves) log-bin:enabled
Fri Sep 11 11:58:00 2020 - [info] Replicating from 192.168.5.131(192.168.5.131:3306)
Fri Sep 11 11:58:00 2020 - [info] 192.168.5.132(192.168.5.132:3306) Version=10.5.5-MariaDB-log (oldest major version between slaves) log-bin:enabled
Fri Sep 11 11:58:00 2020 - [info] Replicating from 192.168.5.131(192.168.5.131:3306)
Fri Sep 11 11:58:00 2020 - [info] Starting Non-GTID based failover.
Fri Sep 11 11:58:00 2020 - [info]
Fri Sep 11 11:58:00 2020 - [info] ** Phase 1: Configuration Check Phase completed.
Fri Sep 11 11:58:00 2020 - [info]
Fri Sep 11 11:58:00 2020 - [info] * Phase 2: Dead Master Shutdown Phase..
Fri Sep 11 11:58:00 2020 - [info]
Fri Sep 11 11:58:00 2020 - [info] Forcing shutdown so that applications never connect to the current master..
Fri Sep 11 11:58:00 2020 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master IP address.
Fri Sep 11 11:58:00 2020 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Fri Sep 11 11:58:01 2020 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Fri Sep 11 11:58:01 2020 - [info]
Fri Sep 11 11:58:01 2020 - [info] * Phase 3: Master Recovery Phase..
Fri Sep 11 11:58:01 2020 - [info]
Fri Sep 11 11:58:01 2020 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Fri Sep 11 11:58:01 2020 - [info]
Fri Sep 11 11:58:01 2020 - [info] The latest binary log file/position on all slaves is master1-bin.000001:1759
Fri Sep 11 11:58:01 2020 - [info] Latest slaves (Slaves that received relay log files to the latest):
Fri Sep 11 11:58:01 2020 - [info] 192.168.5.130(192.168.5.130:3306) Version=10.5.5-MariaDB-log (oldest major version between slaves) log-bin:enabled
Fri Sep 11 11:58:01 2020 - [info] Replicating from 192.168.5.131(192.168.5.131:3306)
Fri Sep 11 11:58:01 2020 - [info] 192.168.5.132(192.168.5.132:3306) Version=10.5.5-MariaDB-log (oldest major version between slaves) log-bin:enabled
Fri Sep 11 11:58:01 2020 - [info] Replicating from 192.168.5.131(192.168.5.131:3306)
Fri Sep 11 11:58:01 2020 - [info] The oldest binary log file/position on all slaves is master1-bin.000001:1759
Fri Sep 11 11:58:01 2020 - [info] Oldest slaves:
Fri Sep 11 11:58:01 2020 - [info] 192.168.5.130(192.168.5.130:3306) Version=10.5.5-MariaDB-log (oldest major version between slaves) log-bin:enabled
Fri Sep 11 11:58:01 2020 - [info] Replicating from 192.168.5.131(192.168.5.131:3306)
Fri Sep 11 11:58:01 2020 - [info] 192.168.5.132(192.168.5.132:3306) Version=10.5.5-MariaDB-log (oldest major version between slaves) log-bin:enabled
Fri Sep 11 11:58:01 2020 - [info] Replicating from 192.168.5.131(192.168.5.131:3306)
Fri Sep 11 11:58:01 2020 - [info]
Fri Sep 11 11:58:01 2020 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Fri Sep 11 11:58:01 2020 - [info]
Fri Sep 11 11:58:01 2020 - [warning] Dead Master is not SSH reachable. Could not save it's binlogs. Transactions that were not sent to the latest slave (Read_Master_Log_Pos to the tail of the dead master's binlog) were lost.
Fri Sep 11 11:58:01 2020 - [info]
Fri Sep 11 11:58:01 2020 - [info] * Phase 3.3: Determining New Master Phase..
Fri Sep 11 11:58:01 2020 - [info]
Fri Sep 11 11:58:01 2020 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Fri Sep 11 11:58:01 2020 - [info] All slaves received relay logs to the same position. No need to resync each other.
Fri Sep 11 11:58:01 2020 - [info] Searching new master from slaves..
Fri Sep 11 11:58:01 2020 - [info] Candidate masters from the configuration file:
Fri Sep 11 11:58:01 2020 - [info] Non-candidate masters:
Fri Sep 11 11:58:01 2020 - [info] New master is 192.168.5.130(192.168.5.130:3306)
Fri Sep 11 11:58:01 2020 - [info] Starting master failover..
Fri Sep 11 11:58:01 2020 - [info]
From:
192.168.5.131(192.168.5.131:3306) (current master)
+--192.168.5.130(192.168.5.130:3306)
+--192.168.5.132(192.168.5.132:3306)

To:
192.168.5.130(192.168.5.130:3306) (new master)
+--192.168.5.132(192.168.5.132:3306)
Fri Sep 11 11:58:01 2020 - [info]
Fri Sep 11 11:58:01 2020 - [info] * Phase 3.4: New Master Diff Log Generation Phase..
Fri Sep 11 11:58:01 2020 - [info]
Fri Sep 11 11:58:01 2020 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Fri Sep 11 11:58:01 2020 - [info]
Fri Sep 11 11:58:01 2020 - [info] * Phase 3.5: Master Log Apply Phase..
Fri Sep 11 11:58:01 2020 - [info]
Fri Sep 11 11:58:01 2020 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Fri Sep 11 11:58:01 2020 - [info] Starting recovery on 192.168.5.130(192.168.5.130:3306)..
Fri Sep 11 11:58:01 2020 - [info] This server has all relay logs. Waiting all logs to be applied..
Fri Sep 11 11:58:01 2020 - [info] done.
Fri Sep 11 11:58:01 2020 - [info] All relay logs were successfully applied.
Fri Sep 11 11:58:01 2020 - [info] Getting new master's binlog name and position..
Fri Sep 11 11:58:01 2020 - [info] slave2-bin.000001:329
Fri Sep 11 11:58:01 2020 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.5.130', MASTER_PORT=3306, MASTER_LOG_FILE='slave2-bin.000001', MASTER_LOG_POS=329, MASTER_USER='replication_user', MASTER_PASSWORD='xxx';
Fri Sep 11 11:58:01 2020 - [warning] master_ip_failover_script is not set. Skipping taking over new master IP address.
Fri Sep 11 11:58:01 2020 - [info] ** Finished master recovery successfully.
Fri Sep 11 11:58:01 2020 - [info] * Phase 3: Master Recovery Phase completed.
Fri Sep 11 11:58:01 2020 - [info]
Fri Sep 11 11:58:01 2020 - [info] * Phase 4: Slaves Recovery Phase..
Fri Sep 11 11:58:01 2020 - [info]
Fri Sep 11 11:58:01 2020 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Fri Sep 11 11:58:01 2020 - [info]
Fri Sep 11 11:58:01 2020 - [info] -- Slave diff file generation on host 192.168.5.132(192.168.5.132:3306) started, pid: 2345. Check tmp log /var/log/masterha/default_app/192.168.5.132_3306_20200911115758.log if it takes time..
Fri Sep 11 11:58:02 2020 - [info]
Fri Sep 11 11:58:02 2020 - [info] Log messages from 192.168.5.132 ...
Fri Sep 11 11:58:02 2020 - [info]
Fri Sep 11 11:58:01 2020 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Fri Sep 11 11:58:02 2020 - [info] End of log messages from 192.168.5.132.
Fri Sep 11 11:58:02 2020 - [info] -- 192.168.5.132(192.168.5.132:3306) has the latest relay log events.
Fri Sep 11 11:58:02 2020 - [info] Generating relay diff files from the latest slave succeeded.
Fri Sep 11 11:58:02 2020 - [info]
Fri Sep 11 11:58:02 2020 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Fri Sep 11 11:58:02 2020 - [info]
Fri Sep 11 11:58:02 2020 - [info] -- Slave recovery on host 192.168.5.132(192.168.5.132:3306) started, pid: 2347. Check tmp log /var/log/masterha/default_app/192.168.5.132_3306_20200911115758.log if it takes time..
Fri Sep 11 11:58:03 2020 - [info]
Fri Sep 11 11:58:03 2020 - [info] Log messages from 192.168.5.132 ...
Fri Sep 11 11:58:03 2020 - [info]
Fri Sep 11 11:58:02 2020 - [info] Starting recovery on 192.168.5.132(192.168.5.132:3306)..
Fri Sep 11 11:58:02 2020 - [info] This server has all relay logs. Waiting all logs to be applied..
Fri Sep 11 11:58:02 2020 - [info] done.
Fri Sep 11 11:58:02 2020 - [info] All relay logs were successfully applied.
Fri Sep 11 11:58:02 2020 - [info] Resetting slave 192.168.5.132(192.168.5.132:3306) and starting replication from the new master 192.168.5.130(192.168.5.130:3306)..
Fri Sep 11 11:58:02 2020 - [info] Executed CHANGE MASTER.
Fri Sep 11 11:58:02 2020 - [info] Slave started.
Fri Sep 11 11:58:03 2020 - [info] End of log messages from 192.168.5.132.
Fri Sep 11 11:58:03 2020 - [info] -- Slave recovery on host 192.168.5.132(192.168.5.132:3306) succeeded.
Fri Sep 11 11:58:03 2020 - [info] All new slave servers recovered successfully.
Fri Sep 11 11:58:03 2020 - [info]
Fri Sep 11 11:58:03 2020 - [info] * Phase 5: New master cleanup phase..
Fri Sep 11 11:58:03 2020 - [info]
Fri Sep 11 11:58:03 2020 - [info] Resetting slave info on the new master..
Fri Sep 11 11:58:03 2020 - [info] 192.168.5.130: Resetting slave info succeeded.
Fri Sep 11 11:58:03 2020 - [info] Master failover to 192.168.5.130(192.168.5.130:3306) completed successfully.
Fri Sep 11 11:58:03 2020 - [info]

----- Failover Report -----

mha_default_app: MySQL Master failover 192.168.5.131(192.168.5.131:3306) to 192.168.5.130(192.168.5.130:3306) succeeded

Master 192.168.5.131(192.168.5.131:3306) is down!

Check MHA Manager logs at mysql-mha:/var/log/masterha/default_app/default_app.log for details.

Started automated(non-interactive) failover.
The latest slave 192.168.5.130(192.168.5.130:3306) has all relay logs for recovery.
Selected 192.168.5.130(192.168.5.130:3306) as a new master.
192.168.5.130(192.168.5.130:3306): OK: Applying all logs succeeded.
192.168.5.132(192.168.5.132:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.5.132(192.168.5.132:3306): OK: Applying all logs succeeded. Slave started, replicating from 192.168.5.130(192.168.5.130:3306)
192.168.5.130(192.168.5.130:3306): Resetting slave info succeeded.
Master failover to 192.168.5.130(192.168.5.130:3306) completed successfully.

至此,MHA架构搭建完毕。