Later in this document:
Edmunds Vesmanis had a presentation in Zabbix Summit 2019 about Zabbix HA setups (video in Youtube), and he also wrote a post in Zabbix blog titled High Availability cluster building with Zabbix for continued service: 3 database servers, 3 Zabbix servers and 3 Zabbix frontend servers. The configuration was based on RHEL or CentOS, and the commands are somewhat different in Debian 10 Buster. Thus, here is my version about Edmunds’s setup on Debian.
You still need to see the original post or the video (linked above) for more information about the background and the idea, as I’m going to mostly show just commands here. I also changed some bits here and there, like used GTID for replication.
Versions used (everything from Debian repo except Zabbix):
IP addresses and hostnames to be used and appended to /etc/hosts
on every server:
192.168.7.87 zabbix-ha-app
192.168.7.88 zabbix-ha-fe
192.168.7.89 zabbix-ha-db
192.168.7.90 zabbix-ha-fe1
192.168.7.91 zabbix-ha-fe2
192.168.7.92 zabbix-ha-fe3
192.168.7.93 zabbix-ha-srv1
192.168.7.94 zabbix-ha-srv2
192.168.7.95 zabbix-ha-srv3
192.168.7.96 zabbix-ha-db1
192.168.7.97 zabbix-ha-db2
192.168.7.99 zabbix-ha-db3
# VIPs
192.168.7.87 zabbix-ha-app
192.168.7.88 zabbix-ha-fe
192.168.7.89 zabbix-ha-db
# Front-end nodes
192.168.7.90 zabbix-ha-fe1
192.168.7.91 zabbix-ha-fe2
192.168.7.92 zabbix-ha-fe3
# Zabbix server nodes
192.168.7.93 zabbix-ha-srv1
192.168.7.94 zabbix-ha-srv2
192.168.7.95 zabbix-ha-srv3
# Database nodes
192.168.7.96 zabbix-ha-db1
192.168.7.97 zabbix-ha-db2
192.168.7.99 zabbix-ha-db3
# VIPs
192.168.7.87 zabbix-ha-app
192.168.7.88 zabbix-ha-fe
192.168.7.89 zabbix-ha-db
# Front-end nodes
192.168.7.90 zabbix-ha-fe1
192.168.7.91 zabbix-ha-fe2
192.168.7.92 zabbix-ha-fe3
# Zabbix server nodes
192.168.7.93 zabbix-ha-srv1
192.168.7.94 zabbix-ha-srv2
192.168.7.95 zabbix-ha-srv3
# Database nodes
192.168.7.96 zabbix-ha-db1
192.168.7.97 zabbix-ha-db2
192.168.7.99 zabbix-ha-db3
Setting up the database servers
On every database server:
vi /etc/hosts # set the hosts file as mentioned above
apt install corosync pacemaker pcs
echo hacluster:Zabbix123 | chpasswd
# Debian has a cluster configured already, ignore the config:
mv /etc/corosync/corosync.conf /etc/corosync/corosync.conf.orig
apt install mariadb-server
sudo -i
vi /etc/hosts # set the hosts file as mentioned above
apt install corosync pacemaker pcs
echo hacluster:Zabbix123 | chpasswd
# Debian has a cluster configured already, ignore the config:
mv /etc/corosync/corosync.conf /etc/corosync/corosync.conf.orig
apt install mariadb-server
systemctl stop mariadb
sudo -i
vi /etc/hosts # set the hosts file as mentioned above
apt install corosync pacemaker pcs
echo hacluster:Zabbix123 | chpasswd
# Debian has a cluster configured already, ignore the config:
mv /etc/corosync/corosync.conf /etc/corosync/corosync.conf.orig
apt install mariadb-server
systemctl stop mariadb
On every database server, create a configuration file /etc/mysql/mariadb.conf.d/90-zabbix.cnf
:
innodb_buffer_pool_size = 1G # 70-80% of total RAM
innodb_buffer_pool_instances = 1 # each instance should be at least 1GB
innodb_flush_log_at_trx_commit = 2 # default = 1
innodb_flush_method = O_DIRECT # default = fsync
innodb_io_capacity = 500 # HDD = 500-800, SSD = 2000
# Change the following values for each server accordingly!
log_basename = zabbix-ha-db1
log_bin = zabbix-ha-db1-bin
server_id = 96 # The last number of the server IP address
[mysqld]
skip_name_resolve
bind_address = 0.0.0.0
log_slave_updates
max_binlog_size = 1G
expire_logs_days = 5
innodb_buffer_pool_size = 1G # 70-80% of total RAM
innodb_buffer_pool_instances = 1 # each instance should be at least 1GB
innodb_flush_log_at_trx_commit = 2 # default = 1
innodb_flush_method = O_DIRECT # default = fsync
innodb_io_capacity = 500 # HDD = 500-800, SSD = 2000
query_cache_size = 0
# Change the following values for each server accordingly!
log_basename = zabbix-ha-db1
log_bin = zabbix-ha-db1-bin
server_id = 96 # The last number of the server IP address
[mysqld]
skip_name_resolve
bind_address = 0.0.0.0
log_slave_updates
max_binlog_size = 1G
expire_logs_days = 5
innodb_buffer_pool_size = 1G # 70-80% of total RAM
innodb_buffer_pool_instances = 1 # each instance should be at least 1GB
innodb_flush_log_at_trx_commit = 2 # default = 1
innodb_flush_method = O_DIRECT # default = fsync
innodb_io_capacity = 500 # HDD = 500-800, SSD = 2000
query_cache_size = 0
# Change the following values for each server accordingly!
log_basename = zabbix-ha-db1
log_bin = zabbix-ha-db1-bin
server_id = 96 # The last number of the server IP address
Start MariaDB again on every server:
systemctl start mariadb
On the first server only:
pcs host auth zabbix-ha-db1 zabbix-ha-db2 zabbix-ha-db3 -u hacluster -p Zabbix123
pcs cluster setup zabbix_db_cluster zabbix-ha-db1 zabbix-ha-db2 zabbix-ha-db3 --force
systemctl enable corosync pacemaker
pcs property set stonith-enabled=false
pcs resource defaults resource-stickiness=100
pcs resource create virtual_ip ocf:heartbeat:IPaddr2 ip=192.168.7.89 op monitor interval=5s --group zabbix_db_cluster
pcs host auth zabbix-ha-db1 zabbix-ha-db2 zabbix-ha-db3 -u hacluster -p Zabbix123
pcs cluster setup zabbix_db_cluster zabbix-ha-db1 zabbix-ha-db2 zabbix-ha-db3 --force
pcs cluster start --all
systemctl enable corosync pacemaker
pcs property set stonith-enabled=false
pcs resource defaults resource-stickiness=100
pcs resource create virtual_ip ocf:heartbeat:IPaddr2 ip=192.168.7.89 op monitor interval=5s --group zabbix_db_cluster
pcs host auth zabbix-ha-db1 zabbix-ha-db2 zabbix-ha-db3 -u hacluster -p Zabbix123
pcs cluster setup zabbix_db_cluster zabbix-ha-db1 zabbix-ha-db2 zabbix-ha-db3 --force
pcs cluster start --all
systemctl enable corosync pacemaker
pcs property set stonith-enabled=false
pcs resource defaults resource-stickiness=100
pcs resource create virtual_ip ocf:heartbeat:IPaddr2 ip=192.168.7.89 op monitor interval=5s --group zabbix_db_cluster
On the second and third server, enable the cluster services:
systemctl enable corosync pacemaker
systemctl enable corosync pacemaker
systemctl enable corosync pacemaker
Check the cluster status with pcs status
command, output example:
root@zabbix-ha-db1:~# pcs status
Cluster name: zabbix_db_cluster
Current DC: zabbix-ha-db3 (version 2.0.1-9e909a5bdd) - partition with quorum
Last updated: Sat Jan 11 19:03:44 2020
Last change: Sat Jan 11 18:48:10 2020 by root via cibadmin on zabbix-ha-db2
Online: [ zabbix-ha-db1 zabbix-ha-db2 zabbix-ha-db3 ]
Resource Group: zabbix_db_cluster
virtual_ip (ocf::heartbeat:IPaddr2): Started zabbix-ha-db1
pacemaker: active/enabled
root@zabbix-ha-db1:~# pcs status
Cluster name: zabbix_db_cluster
Stack: corosync
Current DC: zabbix-ha-db3 (version 2.0.1-9e909a5bdd) - partition with quorum
Last updated: Sat Jan 11 19:03:44 2020
Last change: Sat Jan 11 18:48:10 2020 by root via cibadmin on zabbix-ha-db2
3 nodes configured
1 resource configured
Online: [ zabbix-ha-db1 zabbix-ha-db2 zabbix-ha-db3 ]
Full list of resources:
Resource Group: zabbix_db_cluster
virtual_ip (ocf::heartbeat:IPaddr2): Started zabbix-ha-db1
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
root@zabbix-ha-db1:~# pcs status
Cluster name: zabbix_db_cluster
Stack: corosync
Current DC: zabbix-ha-db3 (version 2.0.1-9e909a5bdd) - partition with quorum
Last updated: Sat Jan 11 19:03:44 2020
Last change: Sat Jan 11 18:48:10 2020 by root via cibadmin on zabbix-ha-db2
3 nodes configured
1 resource configured
Online: [ zabbix-ha-db1 zabbix-ha-db2 zabbix-ha-db3 ]
Full list of resources:
Resource Group: zabbix_db_cluster
virtual_ip (ocf::heartbeat:IPaddr2): Started zabbix-ha-db1
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
Now let’s configure the database replication.
On the first server, start mysql
, and enter commands:
grant replication slave on *.* to 'replicator'@'192.168.7.97' identified by 'Password456';
show global variables like 'gtid_current_pos';
stop slave;
grant replication slave on *.* to 'replicator'@'192.168.7.97' identified by 'Password456';
show global variables like 'gtid_current_pos';
stop slave;
grant replication slave on *.* to 'replicator'@'192.168.7.97' identified by 'Password456';
show global variables like 'gtid_current_pos';
Output example for the GTID position:
MariaDB [(none)]> show global variables like 'gtid_current_pos';
+------------------+--------+
| Variable_name | Value |
+------------------+--------+
| gtid_current_pos | 0-96-1 |
+------------------+--------+
MariaDB [(none)]> show global variables like 'gtid_current_pos';
+------------------+--------+
| Variable_name | Value |
+------------------+--------+
| gtid_current_pos | 0-96-1 |
+------------------+--------+
1 row in set (0.002 sec)
MariaDB [(none)]>
MariaDB [(none)]> show global variables like 'gtid_current_pos';
+------------------+--------+
| Variable_name | Value |
+------------------+--------+
| gtid_current_pos | 0-96-1 |
+------------------+--------+
1 row in set (0.002 sec)
MariaDB [(none)]>
Make a note of the position (“0-96-1
” in this example).
On the second server (zabbix-ha-db2), start mysql
, and enter commands:
set global gtid_slave_pos = '0-96-1'; # The GTID you noted earlier
change master to master_host='192.168.7.96', master_user='replicator', master_password='Password456', master_use_gtid=slave_pos;
grant replication slave on *.* to 'replicator'@'192.168.7.99' identified by 'Password456';
stop slave;
set global gtid_slave_pos = '0-96-1'; # The GTID you noted earlier
change master to master_host='192.168.7.96', master_user='replicator', master_password='Password456', master_use_gtid=slave_pos;
grant replication slave on *.* to 'replicator'@'192.168.7.99' identified by 'Password456';
reset master;
start slave;
show slave status\G
stop slave;
set global gtid_slave_pos = '0-96-1'; # The GTID you noted earlier
change master to master_host='192.168.7.96', master_user='replicator', master_password='Password456', master_use_gtid=slave_pos;
grant replication slave on *.* to 'replicator'@'192.168.7.99' identified by 'Password456';
reset master;
start slave;
show slave status\G
Output example for the slave status:
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.7.96
Master_Log_File: zabbix-ha-db1-bin.000002
Relay_Log_File: zabbix-ha-db2-relay-bin.000002
Relay_Master_Log_File: zabbix-ha-db1-bin.000002
Replicate_Wild_Ignore_Table:
Master_SSL_Verify_Server_Cert: No
Replicate_Ignore_Server_Ids:
Replicate_Ignore_Domain_Ids:
Parallel_Mode: conservative
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Slave_Non_Transactional_Groups: 0
Slave_Transactional_Groups: 0
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.7.96
Master_User: replicator
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: zabbix-ha-db1-bin.000002
Read_Master_Log_Pos: 350
Relay_Log_File: zabbix-ha-db2-relay-bin.000002
Relay_Log_Pos: 657
Relay_Master_Log_File: zabbix-ha-db1-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 350
Relay_Log_Space: 974
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 96
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: Slave_Pos
Gtid_IO_Pos: 0-96-1
Replicate_Do_Domain_Ids:
Replicate_Ignore_Domain_Ids:
Parallel_Mode: conservative
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Slave_DDL_Groups: 0
Slave_Non_Transactional_Groups: 0
Slave_Transactional_Groups: 0
1 row in set (0.000 sec)
MariaDB [(none)]>
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.7.96
Master_User: replicator
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: zabbix-ha-db1-bin.000002
Read_Master_Log_Pos: 350
Relay_Log_File: zabbix-ha-db2-relay-bin.000002
Relay_Log_Pos: 657
Relay_Master_Log_File: zabbix-ha-db1-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 350
Relay_Log_Space: 974
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 96
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: Slave_Pos
Gtid_IO_Pos: 0-96-1
Replicate_Do_Domain_Ids:
Replicate_Ignore_Domain_Ids:
Parallel_Mode: conservative
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Slave_DDL_Groups: 0
Slave_Non_Transactional_Groups: 0
Slave_Transactional_Groups: 0
1 row in set (0.000 sec)
MariaDB [(none)]>
On the third server (zabbix-ha-db3), start mysql
, and enter commands:
set global gtid_slave_pos = '0-96-1'; # The same as noted earlier
change master to master_host='192.168.7.97', master_user='replicator', master_password='Password456', master_use_gtid=slave_pos;
grant replication slave on *.* to 'replicator'@'192.168.7.96' identified by 'Password456';
stop slave;
set global gtid_slave_pos = '0-96-1'; # The same as noted earlier
change master to master_host='192.168.7.97', master_user='replicator', master_password='Password456', master_use_gtid=slave_pos;
grant replication slave on *.* to 'replicator'@'192.168.7.96' identified by 'Password456';
reset master;
start slave;
show slave status\G
stop slave;
set global gtid_slave_pos = '0-96-1'; # The same as noted earlier
change master to master_host='192.168.7.97', master_user='replicator', master_password='Password456', master_use_gtid=slave_pos;
grant replication slave on *.* to 'replicator'@'192.168.7.96' identified by 'Password456';
reset master;
start slave;
show slave status\G
Output example on zabbix-ha-db3:
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.7.97
Master_Log_File: zabbix-ha-db2-bin.000001
Relay_Log_File: zabbix-ha-db3-relay-bin.000002
Relay_Master_Log_File: zabbix-ha-db2-bin.000001
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.7.97
Master_User: replicator
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: zabbix-ha-db2-bin.000001
Read_Master_Log_Pos: 336
Relay_Log_File: zabbix-ha-db3-relay-bin.000002
Relay_Log_Pos: 643
Relay_Master_Log_File: zabbix-ha-db2-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
...
Using_Gtid: Slave_Pos
Gtid_IO_Pos: 0-96-1
...
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.7.97
Master_User: replicator
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: zabbix-ha-db2-bin.000001
Read_Master_Log_Pos: 336
Relay_Log_File: zabbix-ha-db3-relay-bin.000002
Relay_Log_Pos: 643
Relay_Master_Log_File: zabbix-ha-db2-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
...
Using_Gtid: Slave_Pos
Gtid_IO_Pos: 0-96-1
...
On the first database server again (zabbix-ha-db1), to complete the ring replication, start mysql
, and enter commands:
set global gtid_slave_pos = '0-96-1';
change master to master_host='192.168.7.99', master_user='replicator', master_password='Password456', master_use_gtid=slave_pos;
stop slave;
set global gtid_slave_pos = '0-96-1';
change master to master_host='192.168.7.99', master_user='replicator', master_password='Password456', master_use_gtid=slave_pos;
start slave;
show slave status\G
stop slave;
set global gtid_slave_pos = '0-96-1';
change master to master_host='192.168.7.99', master_user='replicator', master_password='Password456', master_use_gtid=slave_pos;
start slave;
show slave status\G
Output example on the first server:
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.7.99
Master_Log_File: zabbix-ha-db3-bin.000001
Relay_Log_File: zabbix-ha-db1-relay-bin.000002
Relay_Master_Log_File: zabbix-ha-db3-bin.000001
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.7.99
Master_User: replicator
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: zabbix-ha-db3-bin.000001
Read_Master_Log_Pos: 336
Relay_Log_File: zabbix-ha-db1-relay-bin.000002
Relay_Log_Pos: 643
Relay_Master_Log_File: zabbix-ha-db3-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
...
Using_Gtid: Slave_Pos
Gtid_IO_Pos: 0-96-1
...
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.7.99
Master_User: replicator
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: zabbix-ha-db3-bin.000001
Read_Master_Log_Pos: 336
Relay_Log_File: zabbix-ha-db1-relay-bin.000002
Relay_Log_Pos: 643
Relay_Master_Log_File: zabbix-ha-db3-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
...
Using_Gtid: Slave_Pos
Gtid_IO_Pos: 0-96-1
...
Now, still on the first server, create the Zabbix database and user:
create database zabbix character set utf8 collate utf8_bin;
grant all privileges on zabbix.* to 'zabbix'@'192.168.7.%' identified by 'Password789';
create database zabbix character set utf8 collate utf8_bin;
grant all privileges on zabbix.* to 'zabbix'@'192.168.7.%' identified by 'Password789';
quit;
create database zabbix character set utf8 collate utf8_bin;
grant all privileges on zabbix.* to 'zabbix'@'192.168.7.%' identified by 'Password789';
quit;
We just created an empty Zabbix database, and we will import the schema later from the Zabbix application server.
Setting up the Zabbix servers
On every Zabbix server:
vi /etc/hosts # set the hosts file as mentioned in the start
apt install corosync pacemaker pcs
echo hacluster:Zabbix123 | chpasswd
mv /etc/corosync/corosync.conf /etc/corosync/corosync.conf.orig
# This is the current release file, please check the latest in https://www.zabbix.com/download
wget https://repo.zabbix.com/zabbix/4.4/debian/pool/main/z/zabbix-release/zabbix-release_4.4-1+buster_all.deb
dpkg -i zabbix-release_4.4-1+buster_all.deb
# We skip MariaDB and snmpd (note the minus characters)
apt install zabbix-server-mysql mariadb-server-10.3- snmpd-
sudo -i
vi /etc/hosts # set the hosts file as mentioned in the start
apt install corosync pacemaker pcs
echo hacluster:Zabbix123 | chpasswd
mv /etc/corosync/corosync.conf /etc/corosync/corosync.conf.orig
# This is the current release file, please check the latest in https://www.zabbix.com/download
wget https://repo.zabbix.com/zabbix/4.4/debian/pool/main/z/zabbix-release/zabbix-release_4.4-1+buster_all.deb
dpkg -i zabbix-release_4.4-1+buster_all.deb
apt update
# We skip MariaDB and snmpd (note the minus characters)
apt install zabbix-server-mysql mariadb-server-10.3- snmpd-
sudo -i
vi /etc/hosts # set the hosts file as mentioned in the start
apt install corosync pacemaker pcs
echo hacluster:Zabbix123 | chpasswd
mv /etc/corosync/corosync.conf /etc/corosync/corosync.conf.orig
# This is the current release file, please check the latest in https://www.zabbix.com/download
wget https://repo.zabbix.com/zabbix/4.4/debian/pool/main/z/zabbix-release/zabbix-release_4.4-1+buster_all.deb
dpkg -i zabbix-release_4.4-1+buster_all.deb
apt update
# We skip MariaDB and snmpd (note the minus characters)
apt install zabbix-server-mysql mariadb-server-10.3- snmpd-
On every Zabbix server, edit /etc/zabbix/zabbix_server.conf
:
SourceIP=192.168.7.87
DBHost=192.168.7.89
DBPassword=Password789
SourceIP=192.168.7.87
DBHost=192.168.7.89
DBPassword=Password789
On the first server only (zabbix-ha-srv1), prepare the Zabbix database, enter the previously set zabbix user password (Password789
) when asked:
zcat /usr/share/doc/zabbix-server-mysql*/create.sql.gz | mysql -h 192.168.7.96 -u zabbix -p zabbix
zcat /usr/share/doc/zabbix-server-mysql*/create.sql.gz | mysql -h 192.168.7.96 -u zabbix -p zabbix
zcat /usr/share/doc/zabbix-server-mysql*/create.sql.gz | mysql -h 192.168.7.96 -u zabbix -p zabbix
Note: If you want to implement Zabbix database table partitioning, this would be the time for that.
Going on, still on the first Zabbix server, set up the cluster:
pcs host auth zabbix-ha-srv1 zabbix-ha-srv2 zabbix-ha-srv3 -u hacluster -p Zabbix123
pcs cluster setup zabbix_server_cluster zabbix-ha-srv1 zabbix-ha-srv2 zabbix-ha-srv3 --force
systemctl enable corosync pacemaker
pcs property set stonith-enabled=false
pcs resource defaults resource-stickiness=100
pcs resource create virtual_ip_server ocf:heartbeat:IPaddr2 ip=192.168.7.87 op monitor interval=5s --group zabbix_server_cluster
pcs resource create ZabbixServer systemd:zabbix-server op monitor interval=10s --group zabbix_server_cluster
pcs constraint colocation add virtual_ip_server with ZabbixServer
pcs constraint order virtual_ip_server then ZabbixServer
# To edit the start/stop timeouts we need to delete them first
pcs resource op delete ZabbixServer start
pcs resource op delete ZabbixServer stop
pcs resource op add ZabbixServer start interval=0s timeout=60s
pcs resource op add ZabbixServer stop interval=0s timeout=120s
pcs host auth zabbix-ha-srv1 zabbix-ha-srv2 zabbix-ha-srv3 -u hacluster -p Zabbix123
pcs cluster setup zabbix_server_cluster zabbix-ha-srv1 zabbix-ha-srv2 zabbix-ha-srv3 --force
pcs cluster start --all
systemctl enable corosync pacemaker
pcs property set stonith-enabled=false
pcs resource defaults resource-stickiness=100
pcs resource create virtual_ip_server ocf:heartbeat:IPaddr2 ip=192.168.7.87 op monitor interval=5s --group zabbix_server_cluster
pcs resource create ZabbixServer systemd:zabbix-server op monitor interval=10s --group zabbix_server_cluster
pcs constraint colocation add virtual_ip_server with ZabbixServer
pcs constraint order virtual_ip_server then ZabbixServer
# To edit the start/stop timeouts we need to delete them first
pcs resource op delete ZabbixServer start
pcs resource op delete ZabbixServer stop
pcs resource op add ZabbixServer start interval=0s timeout=60s
pcs resource op add ZabbixServer stop interval=0s timeout=120s
pcs host auth zabbix-ha-srv1 zabbix-ha-srv2 zabbix-ha-srv3 -u hacluster -p Zabbix123
pcs cluster setup zabbix_server_cluster zabbix-ha-srv1 zabbix-ha-srv2 zabbix-ha-srv3 --force
pcs cluster start --all
systemctl enable corosync pacemaker
pcs property set stonith-enabled=false
pcs resource defaults resource-stickiness=100
pcs resource create virtual_ip_server ocf:heartbeat:IPaddr2 ip=192.168.7.87 op monitor interval=5s --group zabbix_server_cluster
pcs resource create ZabbixServer systemd:zabbix-server op monitor interval=10s --group zabbix_server_cluster
pcs constraint colocation add virtual_ip_server with ZabbixServer
pcs constraint order virtual_ip_server then ZabbixServer
# To edit the start/stop timeouts we need to delete them first
pcs resource op delete ZabbixServer start
pcs resource op delete ZabbixServer stop
pcs resource op add ZabbixServer start interval=0s timeout=60s
pcs resource op add ZabbixServer stop interval=0s timeout=120s
Check the cluster status with pcs status
command, output example:
root@zabbix-ha-srv1:~# pcs status
Cluster name: zabbix_server_cluster
Current DC: zabbix-ha-srv2 (version 2.0.1-9e909a5bdd) - partition with quorum
Last updated: Sat Jan 11 19:25:31 2020
Last change: Sat Jan 11 19:21:09 2020 by root via cibadmin on zabbix-ha-srv1
Online: [ zabbix-ha-srv1 zabbix-ha-srv2 zabbix-ha-srv3 ]
Resource Group: zabbix_server_cluster
virtual_ip_server (ocf::heartbeat:IPaddr2): Started zabbix-ha-srv1
ZabbixServer (systemd:zabbix-server): Started zabbix-ha-srv1
pacemaker: active/enabled
root@zabbix-ha-srv1:~# pcs status
Cluster name: zabbix_server_cluster
Stack: corosync
Current DC: zabbix-ha-srv2 (version 2.0.1-9e909a5bdd) - partition with quorum
Last updated: Sat Jan 11 19:25:31 2020
Last change: Sat Jan 11 19:21:09 2020 by root via cibadmin on zabbix-ha-srv1
3 nodes configured
2 resources configured
Online: [ zabbix-ha-srv1 zabbix-ha-srv2 zabbix-ha-srv3 ]
Full list of resources:
Resource Group: zabbix_server_cluster
virtual_ip_server (ocf::heartbeat:IPaddr2): Started zabbix-ha-srv1
ZabbixServer (systemd:zabbix-server): Started zabbix-ha-srv1
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
root@zabbix-ha-srv1:~# pcs status
Cluster name: zabbix_server_cluster
Stack: corosync
Current DC: zabbix-ha-srv2 (version 2.0.1-9e909a5bdd) - partition with quorum
Last updated: Sat Jan 11 19:25:31 2020
Last change: Sat Jan 11 19:21:09 2020 by root via cibadmin on zabbix-ha-srv1
3 nodes configured
2 resources configured
Online: [ zabbix-ha-srv1 zabbix-ha-srv2 zabbix-ha-srv3 ]
Full list of resources:
Resource Group: zabbix_server_cluster
virtual_ip_server (ocf::heartbeat:IPaddr2): Started zabbix-ha-srv1
ZabbixServer (systemd:zabbix-server): Started zabbix-ha-srv1
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
Note that we didn’t excplicitly enable the zabbix-server
service, and it is not enabled by default. The clustering service will take care of starting the service on the active node.
Setting up the frontend web servers
On every web server:
vi /etc/hosts # set the hosts file as mentioned in the start
apt install corosync pacemaker pcs
echo hacluster:Zabbix123 | chpasswd
mv /etc/corosync/corosync.conf /etc/corosync/corosync.conf.orig
# This is the current release file, please check the latest in https://www.zabbix.com/download
wget https://repo.zabbix.com/zabbix/4.4/debian/pool/main/z/zabbix-release/zabbix-release_4.4-1+buster_all.deb
dpkg -i zabbix-release_4.4-1+buster_all.deb
apt install zabbix-frontend-php zabbix-apache-conf apache2
systemctl disable apache2
sudo -i
vi /etc/hosts # set the hosts file as mentioned in the start
apt install corosync pacemaker pcs
echo hacluster:Zabbix123 | chpasswd
mv /etc/corosync/corosync.conf /etc/corosync/corosync.conf.orig
# This is the current release file, please check the latest in https://www.zabbix.com/download
wget https://repo.zabbix.com/zabbix/4.4/debian/pool/main/z/zabbix-release/zabbix-release_4.4-1+buster_all.deb
dpkg -i zabbix-release_4.4-1+buster_all.deb
apt update
apt install zabbix-frontend-php zabbix-apache-conf apache2
systemctl stop apache2
systemctl disable apache2
sudo -i
vi /etc/hosts # set the hosts file as mentioned in the start
apt install corosync pacemaker pcs
echo hacluster:Zabbix123 | chpasswd
mv /etc/corosync/corosync.conf /etc/corosync/corosync.conf.orig
# This is the current release file, please check the latest in https://www.zabbix.com/download
wget https://repo.zabbix.com/zabbix/4.4/debian/pool/main/z/zabbix-release/zabbix-release_4.4-1+buster_all.deb
dpkg -i zabbix-release_4.4-1+buster_all.deb
apt update
apt install zabbix-frontend-php zabbix-apache-conf apache2
systemctl stop apache2
systemctl disable apache2
On every web server, edit /etc/zabbix/apache.conf
to set the time zone in the PHP7 settings, for example:
php_value date.timezone Europe/Helsinki
...
<IfModule mod_php7.c>
...
php_value date.timezone Europe/Helsinki
</IfModule>
...
...
<IfModule mod_php7.c>
...
php_value date.timezone Europe/Helsinki
</IfModule>
...
On every web server, create /etc/apache2/conf-available/serverstatus.conf
:
<VirtualHost 127.0.0.1:8080>
<Location /server-status>
Listen 127.0.0.1:8080
<VirtualHost 127.0.0.1:8080>
<Location /server-status>
SetHandler server-status
Require local
</Location>
</VirtualHost>
Listen 127.0.0.1:8080
<VirtualHost 127.0.0.1:8080>
<Location /server-status>
SetHandler server-status
Require local
</Location>
</VirtualHost>
On every web server, activate the server status configuration:
a2enconf serverstatus
On every web server, edit /etc/apache2/ports.conf
, change Listen 80
to include the cluster IP address:
Listen 192.168.7.88:80
On the first web server, let’s now configure Zabbix frontend:
systemctl start apache2
Using a browser, go to http://192.168.7.90/zabbix/, and configure the Zabbix frontend as requested. Note to enter the database cluster IP address 192.168.7.89
when asked, and Zabbix server cluster IP address 192.168.7.87
.
When Zabbix frontend has been successfully configured, copy the resulted configuration file /etc/zabbix/web/zabbix.conf.php
to the second and third web servers.
On the first web server, stop Apache, and configure the cluster:
pcs host auth zabbix-ha-fe1 zabbix-ha-fe2 zabbix-ha-fe3 -u hacluster -p Zabbix123
pcs cluster setup zabbix_fe_cluster zabbix-ha-fe1 zabbix-ha-fe2 zabbix-ha-fe3 --force
systemctl enable corosync pacemaker
pcs property set stonith-enabled=false
pcs resource defaults resource-stickiness=100
pcs resource create virtual_ip_fe ocf:heartbeat:IPaddr2 ip=192.168.7.88 op monitor interval=5s --group zabbix_fe_cluster
pcs resource create zabbix_fe ocf:heartbeat:apache configfile=/etc/apache2/apache2.conf statusurl="http://localhost:8080/server-status" op monitor interval=30s --group zabbix_fe_cluster
pcs constraint colocation add virtual_ip_fe with zabbix_fe
pcs constraint order virtual_ip_fe then zabbix_fe
# To edit the start/stop timeouts we need to delete them first
pcs resource op delete zabbix_fe start
pcs resource op delete zabbix_fe stop
pcs resource op add zabbix_fe start interval=0s timeout=60s
pcs resource op add zabbix_fe stop interval=0s timeout=120s
systemctl stop apache2
pcs host auth zabbix-ha-fe1 zabbix-ha-fe2 zabbix-ha-fe3 -u hacluster -p Zabbix123
pcs cluster setup zabbix_fe_cluster zabbix-ha-fe1 zabbix-ha-fe2 zabbix-ha-fe3 --force
pcs cluster start --all
systemctl enable corosync pacemaker
pcs property set stonith-enabled=false
pcs resource defaults resource-stickiness=100
pcs resource create virtual_ip_fe ocf:heartbeat:IPaddr2 ip=192.168.7.88 op monitor interval=5s --group zabbix_fe_cluster
pcs resource create zabbix_fe ocf:heartbeat:apache configfile=/etc/apache2/apache2.conf statusurl="http://localhost:8080/server-status" op monitor interval=30s --group zabbix_fe_cluster
pcs constraint colocation add virtual_ip_fe with zabbix_fe
pcs constraint order virtual_ip_fe then zabbix_fe
# To edit the start/stop timeouts we need to delete them first
pcs resource op delete zabbix_fe start
pcs resource op delete zabbix_fe stop
pcs resource op add zabbix_fe start interval=0s timeout=60s
pcs resource op add zabbix_fe stop interval=0s timeout=120s
systemctl stop apache2
pcs host auth zabbix-ha-fe1 zabbix-ha-fe2 zabbix-ha-fe3 -u hacluster -p Zabbix123
pcs cluster setup zabbix_fe_cluster zabbix-ha-fe1 zabbix-ha-fe2 zabbix-ha-fe3 --force
pcs cluster start --all
systemctl enable corosync pacemaker
pcs property set stonith-enabled=false
pcs resource defaults resource-stickiness=100
pcs resource create virtual_ip_fe ocf:heartbeat:IPaddr2 ip=192.168.7.88 op monitor interval=5s --group zabbix_fe_cluster
pcs resource create zabbix_fe ocf:heartbeat:apache configfile=/etc/apache2/apache2.conf statusurl="http://localhost:8080/server-status" op monitor interval=30s --group zabbix_fe_cluster
pcs constraint colocation add virtual_ip_fe with zabbix_fe
pcs constraint order virtual_ip_fe then zabbix_fe
# To edit the start/stop timeouts we need to delete them first
pcs resource op delete zabbix_fe start
pcs resource op delete zabbix_fe stop
pcs resource op add zabbix_fe start interval=0s timeout=60s
pcs resource op add zabbix_fe stop interval=0s timeout=120s
Finally, on the second and third web servers, enable the cluster services:
systemctl enable corosync pacemaker
systemctl enable corosync pacemaker
systemctl enable corosync pacemaker
You can check the web cluster status with pcs status
:
root@zabbix-ha-fe1:~# pcs status
Cluster name: zabbix_fe_cluster
Current DC: zabbix-ha-fe3 (version 2.0.1-9e909a5bdd) - partition with quorum
Last updated: Sat Jan 11 20:26:42 2020
Last change: Sat Jan 11 19:33:19 2020 by root via cibadmin on zabbix-ha-fe1
Online: [ zabbix-ha-fe1 zabbix-ha-fe2 zabbix-ha-fe3 ]
Resource Group: zabbix_fe_cluster
virtual_ip_fe (ocf::heartbeat:IPaddr2): Started zabbix-ha-fe1
zabbix_fe (ocf::heartbeat:apache): Started zabbix-ha-fe1
pacemaker: active/enabled
root@zabbix-ha-fe1:~# pcs status
Cluster name: zabbix_fe_cluster
Stack: corosync
Current DC: zabbix-ha-fe3 (version 2.0.1-9e909a5bdd) - partition with quorum
Last updated: Sat Jan 11 20:26:42 2020
Last change: Sat Jan 11 19:33:19 2020 by root via cibadmin on zabbix-ha-fe1
3 nodes configured
2 resources configured
Online: [ zabbix-ha-fe1 zabbix-ha-fe2 zabbix-ha-fe3 ]
Full list of resources:
Resource Group: zabbix_fe_cluster
virtual_ip_fe (ocf::heartbeat:IPaddr2): Started zabbix-ha-fe1
zabbix_fe (ocf::heartbeat:apache): Started zabbix-ha-fe1
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
root@zabbix-ha-fe1:~# pcs status
Cluster name: zabbix_fe_cluster
Stack: corosync
Current DC: zabbix-ha-fe3 (version 2.0.1-9e909a5bdd) - partition with quorum
Last updated: Sat Jan 11 20:26:42 2020
Last change: Sat Jan 11 19:33:19 2020 by root via cibadmin on zabbix-ha-fe1
3 nodes configured
2 resources configured
Online: [ zabbix-ha-fe1 zabbix-ha-fe2 zabbix-ha-fe3 ]
Full list of resources:
Resource Group: zabbix_fe_cluster
virtual_ip_fe (ocf::heartbeat:IPaddr2): Started zabbix-ha-fe1
zabbix_fe (ocf::heartbeat:apache): Started zabbix-ha-fe1
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
You can also see which addresses and ports the active web server is listening to:
root@zabbix-ha-fe1:~# ss -ntul
Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port
udp UNCONN 0 0 192.168.7.90:5405 0.0.0.0:*
tcp LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
tcp LISTEN 0 128 127.0.0.1:8080 0.0.0.0:*
tcp LISTEN 0 128 192.168.7.88:80 0.0.0.0:*
tcp LISTEN 0 128 0.0.0.0:2224 0.0.0.0:*
tcp LISTEN 0 128 [::]:22 [::]:*
tcp LISTEN 0 128 [::]:2224 [::]:*
root@zabbix-ha-fe1:~# ss -ntul
Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port
udp UNCONN 0 0 192.168.7.90:5405 0.0.0.0:*
tcp LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
tcp LISTEN 0 128 127.0.0.1:8080 0.0.0.0:*
tcp LISTEN 0 128 192.168.7.88:80 0.0.0.0:*
tcp LISTEN 0 128 0.0.0.0:2224 0.0.0.0:*
tcp LISTEN 0 128 [::]:22 [::]:*
tcp LISTEN 0 128 [::]:2224 [::]:*
root@zabbix-ha-fe1:~# ss -ntul
Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port
udp UNCONN 0 0 192.168.7.90:5405 0.0.0.0:*
tcp LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
tcp LISTEN 0 128 127.0.0.1:8080 0.0.0.0:*
tcp LISTEN 0 128 192.168.7.88:80 0.0.0.0:*
tcp LISTEN 0 128 0.0.0.0:2224 0.0.0.0:*
tcp LISTEN 0 128 [::]:22 [::]:*
tcp LISTEN 0 128 [::]:2224 [::]:*
As you can see, the server status service (port 8080) is only listening on the localhost address 127.0.0.1
, and web server on port 80 is listening on the cluster IP address 192.168.7.88
. It is left as an excercise for the reader to enable also IPv6 and/or TLS connectivity on the web server.
That’s it
As Edmunds said, this was just the bare minimum, but a good start anyway.
Some cluster commands useful in troubleshooting or management:
pcs node standby # see "pcs node --help"
man pcs
pcs config
pcs node standby # see "pcs node --help"
pcs node unstandby
pcs quorum status
man pcs
pcs config
pcs node standby # see "pcs node --help"
pcs node unstandby
pcs quorum status
See also: ClusterLabs
hi can you help me please I’m getting below erro while:
# pcs resource create virtual_ip ocf:heartbeat:IPaddr2 ip=192.168.7.89 op monitor interval=5s –group zabbix_db_cluster
Error: When using ‘op’ you must specify an operation name and at least one option
What pcs version are you using? You can see my version in the beginning of the post.
Hi, I cannot start the apache server after including the cluster IP into the listening port
Listen 192.168.7.88:80
The apache start failed with the error:
(99)Cannot assign requested address: AH00072: make_sock: could not bind to address 192.168.7.88:80
no listening sockets available, shutting down
Could you help? thanks!
Hi, most probably means that something is already listening to port 80. First, make sure that Apache is really not running (= stop it completely). Then, check that your Apache configuration files don’t have any other Listen statements with port 80 (for example “Listen *:80”), and start Apache. If you don’t know which software is using port 80, use “sudo ss -ntlp” to show the listening ports and their processes.
Hi, Can I create 9 servers HA in docker?
THanks for keeping these instructions up. They ahve been quite helpful. I have everything set up, but still have the dreaded “MySQL server has gone away” message. pcs status is showing everything operational and I am able to connect using mysql-client from every server to the database with the zabbix user. I have spent the past 4 hours reading every article on the Zabbix forums, and even reviewing everything in the MySQL documentation. I even went so far as to rebuild servers, but I always get to the same spot. FML.
But the error is all mine somewhere, and I will eventually figure it out. I at least have it working properly at the office. My only issue is my home setup.
Love the great instructions you provided.