最近都在弄postgresql
備份、還原測試得差不多了,就等著看到時候要用什麼方式
前幾天看到 pg_auto_failover 這個postgresql 的extension
https://github.com/citusdata/pg_auto_failover
感覺挺不錯的,看起來設定很簡單,雖然之前已經測試了 keepalived 做 HA
不過,反正當作練功嘛,多測試一套也不錯!
<!--more-->
基本的邏輯是 一台 monitor , 一台 master/primary node ,一台 slave/secondary node 組成一個cluster
官方提供的架構圖如下
當master/primary node 上面的 postgresql 服務死掉了,slave/secondary node 會自動接手
等到master/primary node 回來之後,會自動降級為 slave/secondary node
而原本的 slave/secondary node 就會變成 master/primary
安裝相關套件
以下步驟在三個node 都要操作
安裝相依套件
sudo apt install make libssl1.0.0 libssl-dev libkrb5-3 libkrb5-dev libpq5 libpq-dev pgdg-keyring ssl-cert postgresql-plperl-11 postgresql-pltcl-11 postgresql-plpython-11 postgresql-plpython3-11 postgresql-11 postgresql-common postgresql-server-dev-11 postgresql-client-11 postgresql-client-common postgresql-doc-11 tcl8.6 libjson-perl tcl-tclreadline
安裝完成後,切換到postgres身份,把原本系統內的postgresql 檔存放目錄還有資料庫檔案目錄都清掉
sudo su - postgres
mv /etc/postgresql/11 /etc/postgresql/11.bak
mv /var/lib/postgresql/11 /var/lib/postgresql/11.bak
exit
開始安裝 pg_auto_failover
curl https://install.citusdata.com/community/deb.sh | sudo bash
sudo apt install postgresql-11-auto-failover -y
postgresql 的相關執行檔路徑,預設不會載入到PATH變數中,所以要自己手動增加
直接加入 /etc/environment ,或者是去修改使用者的 .profile 載入正確的 $PATH 都可以
#replace PATH variable in /etc/environment
PATH="/usr/lib/postgresql/11/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games"
因為要用 pg_auto_failover 來控制postgresql ,所以要把系統內建的服務先停掉,並且設定開啟不啟動
sudo systemctl disable postgresql
sudo systemctl stop postgresql
為了讓三台機器可以直接用hostname溝通,所以要修改 /etc/hosts
#add to /etc/hosts
192.168.11.151 monitor monitor.abc.com
192.168.11.152 pg-primary pg-primary.abc.com
192.168.11.153 pg-slave pg-slave.abc.com
以上完成在三台node上,安裝postgresql/pg_auto_failover 的工作
設定 monitor node
在monitor node操作
建立monitor node
sudo runuser -l postgres -c "/usr/bin/pg_autoctl create monitor --pgdata /var/lib/postgresql/11/main --pgport 5432 --nodename monitor"
這個步驟會在pg_hba.conf 中,新增一筆紀錄
host "pg_auto_failover" "autoctl_node" 192.168.11.0/24 trust # Auto-generated by pg_auto_failover
檢查一下狀態,看看有沒有安裝成功
2019-09-19 06:56:09 [administrator@monitor ~]$ sudo runuser -l postgres -c "psql -c '\'du"
List of roles
Role name | Attributes | Member of
--------------+------------------------------------------------------------+-----------
autoctl | | {}
autoctl_node | | {}
postgres | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
2019-09-19 06:56:15 [administrator@monitor ~]$ sudo runuser -l postgres -c "psql -c '\'l"
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
------------------+----------+-----------+---------+-------+-----------------------
pg_auto_failover | autoctl | SQL_ASCII | C | C |
postgres | postgres | SQL_ASCII | C | C |
template0 | postgres | SQL_ASCII | C | C | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | SQL_ASCII | C | C | =c/postgres +
| | | | | postgres=CTc/postgres
(4 rows)
OK,看起來沒有問題
啟動pg_auto_failover monitor
2019-09-19 06:56:19 [administrator@monitor ~]$ sudo runuser -l postgres -c "pg_autoctl run --pgdata /var/lib/postgresql/11/main"
06:57:38 INFO Managing PostgreSQL installation at "/var/lib/postgresql/11/main"
06:57:38 INFO PostgreSQL is running in "/var/lib/postgresql/11/main" on port 5432
06:57:38 INFO The version of extenstion "pgautofailover" is "1.0" on the monitor
06:57:38 INFO pg_auto_failover monitor is ready at postgres://autoctl_node@monitor:5432/pg_auto_failover
06:57:38 INFO Contacting the monitor to LISTEN to its events
這個指令會停留在console畫面上,要不就是在最後加上 &
放去背景執行
要不就是用systemd/supervisor 來控制,這個後面再來改
就先放著讓他跑,還可以順便觀察cluster變動時的狀態
再開一個terminal 跑底下的指令,產生monitor node 的URI ,在建立maser/slave node 的時候會用到
2019-09-19 06:58:39 [administrator@monitor ~]$ sudo runuser -l postgres -c "pg_autoctl show uri --pgdata /var/lib/postgresql/11/main"
postgres://autoctl_node@monitor:5432/pg_auto_failover
2019-09-19 06:59:01 [administrator@monitor ~]$
以上完成monitor node 的準備工作
設定master/primary node
建立pg_auto_failover master/primary node
2019-09-19 15:04:16 [administrator@pg-primary ~]$ sudo runuser -l postgres -c "pg_autoctl create postgres --pgdata /var/lib/postgresql/11/main --pgport 5432 --nodename pg-primary --monitor postgres://autoctl_node@monitor:5432/pg_auto_failover"
15:04:19 INFO Found pg_ctl for PostgreSQL 11.5 at /usr/lib/postgresql/11/bin/pg_ctl
15:04:19 INFO Registered node pg-primary:5432 with id 1 in formation "default", group 0.
15:04:19 INFO Writing keeper init state file at "/var/lib/postgresql/.local/share/pg_autoctl/var/lib/postgresql/11/main/pg_autoctl.init"
15:04:19 INFO Successfully registered as "single" to the monitor.
15:04:21 INFO Initialising a PostgreSQL cluster at "/var/lib/postgresql/11/main"
15:04:21 INFO Postgres is not running, starting postgres
15:04:21 INFO /usr/lib/postgresql/11/bin/pg_ctl --pgdata /var/lib/postgresql/11/main --options "-p 5432" --options "-h *" --wait start
15:04:21 INFO CREATE DATABASE postgres;
15:04:21 INFO The database "postgres" already exists, skipping.
15:04:21 INFO FSM transition from "init" to "single": Start as a single node
15:04:21 INFO Initialising postgres as a primary
15:04:21 INFO Transition complete: current state is now "single"
15:04:21 INFO Keeper has been succesfully initialized.
2019-09-19 15:04:21 [administrator@pg-primary ~]$
啟動 pg_auto_failover master/primary node
2019-09-19 15:14:34 [administrator@pg-primary ~]$ sudo runuser -l postgres -c "pg_autoctl run --pgdata /var/lib/postgresql/11/main"
15:15:11 INFO Managing PostgreSQL installation at "/var/lib/postgresql/11/main"
15:15:11 INFO The version of extenstion "pgautofailover" is "1.0" on the monitor
15:15:11 INFO pg_autoctl service is starting
15:15:11 INFO Calling node_active for node default/1/0 with current state: single, PostgreSQL is running, sync_state is "", WAL delta is -1.
15:15:16 INFO Calling node_active for node default/1/0 with current state: single, PostgreSQL is running, sync_state is "", WAL delta is -1.
15:15:21 INFO Calling node_active for node default/1/0 with current state: single, PostgreSQL is running, sync_state is "", WAL delta is -1.
一樣,執行完之後,會停留在畫面上,所以開另一個視窗來執行以下檢查的指令
2019-09-19 15:17:08 [administrator@pg-primary ~]$ sudo runuser -l postgres -c "pg_autoctl show state --pgdata /var/lib/postgresql/11/main"
Name | Port | Group | Node | Current State | Assigned State
-----------+--------+-------+-------+-------------------+------------------
pg-primary | 5432 | 0 | 1 | single | single
2019-09-19 15:17:16 [administrator@pg-primary ~]$
以上,master/primary node 設定結束
####設定 slave/secondary node
建立 pg_auto_failover slave/secondary node
2019-09-19 07:04:08 [administrator@pg-slave ~]$ sudo runuser -l postgres -c "pg_autoctl create postgres --pgdata /var/lib/postgresql/11/main --pgport 5432 --nodename pg-slave --monitor postgres://autoctl_node@monitor:5432/pg_auto_failover"
07:19:57 INFO Found pg_ctl for PostgreSQL 11.5 at /usr/lib/postgresql/11/bin/pg_ctl
07:19:57 INFO Registered node pg-slave:5432 with id 2 in formation "default", group 0.
07:19:57 INFO Writing keeper init state file at "/var/lib/postgresql/.local/share/pg_autoctl/var/lib/postgresql/11/main/pg_autoctl.init"
07:19:57 INFO Successfully registered as "wait_standby" to the monitor.
07:19:57 INFO FSM transition from "init" to "wait_standby": Start following a primary
07:19:57 INFO Transition complete: current state is now "wait_standby"
07:20:02 INFO FSM transition from "wait_standby" to "catchingup": The primary is now ready to accept a standby
07:20:02 INFO The primary node returned by the monitor is pg-primary:5432
07:20:02 INFO Initialising PostgreSQL as a hot standby
07:20:02 INFO Running /usr/lib/postgresql/11/bin/pg_basebackup -w -h pg-primary -p 5432 --pgdata /var/lib/postgresql/11/backup -U pgautofailover_replicator --write-recovery-conf --max-rate 100M --wal-method=stream --slot pgautofailover_standby ...
07:20:04 INFO pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
0/23751 kB (0%), 0/1 tablespace (...ostgresql/11/backup/backup_label)
13103/23751 kB (55%), 0/1 tablespace (...gresql/11/backup/base/13091/2602)
23761/23761 kB (100%), 0/1 tablespace (...esql/11/backup/global/pg_control)
23761/23761 kB (100%), 1/1 tablespace
pg_basebackup: write-ahead log end point: 0/20000F8
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: base backup completed
07:20:04 INFO Postgres is not running, starting postgres
07:20:04 INFO /usr/lib/postgresql/11/bin/pg_ctl --pgdata /var/lib/postgresql/11/main --options "-p 5432" --options "-h *" --wait start
07:20:04 INFO PostgreSQL started on port 5432
07:20:04 INFO Transition complete: current state is now "catchingup"
07:20:04 INFO Keeper has been succesfully initialized.
2019-09-19 07:20:04 [administrator@pg-slave ~]$
可以看到在建立slave/secondary node 的時候,就會開始第一次的同步
**啟動 pg_auto_failover slave/secondary node
2019-09-19 07:20:04 [administrator@pg-slave ~]$ sudo runuser -l postgres -c "pg_autoctl run --pgdata /var/lib/postgresql/11/main"
07:21:33 INFO Managing PostgreSQL installation at "/var/lib/postgresql/11/main"
07:21:33 INFO The version of extenstion "pgautofailover" is "1.0" on the monitor
07:21:33 INFO pg_autoctl service is starting
07:21:33 INFO Calling node_active for node default/2/0 with current state: catchingup, PostgreSQL is running, sync_state is "", WAL delta is -1.
07:21:33 INFO FSM transition from "catchingup" to "secondary": Convinced the monitor that I'm up and running, and eligible for promotion again
07:21:33 INFO Transition complete: current state is now "secondary"
07:21:34 INFO Calling node_active for node default/2/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
07:21:39 INFO Calling node_active for node default/2/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
開新視窗確認狀態
2019-09-19 07:22:03 [administrator@pg-slave ~]$ sudo runuser -l postgres -c "pg_autoctl show state --pgdata /var/lib/postgresql/11/main"
Name | Port | Group | Node | Current State | Assigned State
-----------+--------+-------+-------+-------------------+------------------
pg-primary | 5432 | 0 | 1 | primary | primary
pg-slave | 5432 | 0 | 2 | secondary | secondary
2019-09-19 07:22:45 [administrator@pg-slave ~]$
正常的話,應該就是會出現兩個 node ,一個 pg-primary 一個pg-slave
generate URI for applications
這個步驟是用來產生給client/applications 連線用的連線字串(connection string)
2019-09-19 07:29:56 [administrator@pg-slave ~]$ sudo runuser -l postgres -c "pg_autoctl show uri --formation default --pgdata /var/lib/postgresql/11/main"
postgres://pg-slave:5432,pg-primary:5432/postgres?target_session_attrs=read-write
2019-09-19 07:30:06 [administrator@pg-slave ~]$
以後就都通過這個URI來存取這個cluster
切記,不可以用系統內建的postgresql service,一定要用 pg_auto_failover 來啟動DB
以上 slave/secondary node 設定完成
設定上並不複雜,比起keepalived 要簡單太多了..
那就要評估看看在異常狀況發生,切換資料庫時的表現了
接下來就繼續測試auto failover 的功能!
測試 auto failover
首先,在 pg-prmiary node 上,透過上面產生的URI來進入psql
然後建立一個測試db、建立一個測試table,插入幾筆資料
2019-09-20 10:58:21 [administrator@pg-primary ~]$ sudo runuser -l postgres -c "psql postgres://pg-slave:5432,pg-primary:5432/postgres?target_session_attrs=read-write"
psql (11.5 (Ubuntu 11.5-1.pgdg18.04+1))
Type "help" for help.
postgres=# create database testdb_1058;
CREATE DATABASE
postgres=# create table testtbl (serial int);
CREATE TABLE
postgres=# insert into testtbl values (111);
INSERT 0 1
postgres=# insert into testtbl values (222);
INSERT 0 1
postgres=# insert into testtbl values (3);
INSERT 0 1
postgres=# insert into testtbl values (444);
INSERT 0 1
postgres=# select * from testtbl;
serial
--------
111
222
3
444
(4 rows)
postgres=# \q
2019-09-20 10:59:30 [administrator@pg-primary ~]$
然後切換到 pg-slave ,一樣透過URI進入psql 撈看看資料
2019-09-20 03:02:12 [administrator@pg-slave ~]$ sudo runuser -l postgres -c "psql postgres://pg-slave:5432,pg-primary:5432/postgres?target_session_attrs=read-write"
psql (11.5 (Ubuntu 11.5-1.pgdg18.04+1))
Type "help" for help.
postgres=# select * from testtbl;
serial
--------
111
222
3
444
(4 rows)
postgres=#
確認透過URI的確是可以正常的存取資料
在兩台單機上(master/slave)上也都可以看到同樣的資料,代表資料有正確的被同步到兩台node上
pg-slave
2019-09-20 03:03:06 [administrator@pg-slave ~]$ sudo runuser -l postgres -c psql
psql (11.5 (Ubuntu 11.5-1.pgdg18.04+1))
Type "help" for help.
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-------------+----------+-----------+---------+-------+-----------------------
nexus | postgres | SQL_ASCII | C | C |
postgres | postgres | SQL_ASCII | C | C |
template0 | postgres | SQL_ASCII | C | C | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | SQL_ASCII | C | C | =c/postgres +
| | | | | postgres=CTc/postgres
testdb_1058 | postgres | SQL_ASCII | C | C |
(5 rows)
postgres=# select * from testtbl;
serial
--------
111
222
3
444
(4 rows)
postgres=#
pg-master
2019-09-20 10:59:30 [administrator@pg-primary ~]$ sudo runuser -l postgres -c psql
psql (11.5 (Ubuntu 11.5-1.pgdg18.04+1))
Type "help" for help.
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-------------+----------+-----------+---------+-------+-----------------------
nexus | postgres | SQL_ASCII | C | C |
postgres | postgres | SQL_ASCII | C | C |
template0 | postgres | SQL_ASCII | C | C | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | SQL_ASCII | C | C | =c/postgres +
| | | | | postgres=CTc/postgres
testdb_1058 | postgres | SQL_ASCII | C | C |
(5 rows)
postgres=# select * from testtbl;
serial
--------
111
222
3
444
(4 rows)
postgres=#
接著來把本來是master的機器關掉,看看會有什麼變化
首先,在pg-slave 這台機器上,就會看到cluster 有狀況,primary 不見了
03:06:02 INFO Calling node_active for node default/4/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
03:06:08 INFO Calling node_active for node default/4/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
03:06:13 INFO Calling node_active for node default/4/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
03:06:18 ERROR PostgreSQL cannot reach the primary server: the system view pg_stat_wal_receiver has no rows.
03:06:18 INFO Calling node_active for node default/4/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is -1.
03:06:23 ERROR PostgreSQL cannot reach the primary server: the system view pg_stat_wal_receiver has no rows.
03:06:23 INFO Calling node_active for node default/4/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is -1.
03:06:28 ERROR PostgreSQL cannot reach the primary server: the system view pg_stat_wal_receiver has no rows.
03:06:28 INFO Calling node_active for node default/4/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is -1.
03:06:33 ERROR PostgreSQL cannot reach the primary server: the system view pg_stat_wal_receiver has no rows.
03:06:33 INFO Calling node_active for node default/4/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is -1.
03:06:38 ERROR PostgreSQL cannot reach the primary server: the system view pg_stat_wal_receiver has no rows.
03:06:38 INFO Calling node_active for node default/4/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is -1.
這時候在pg-slave 還能不能透過URI 連線進去psql操作?
2019-09-20 03:08:06 [administrator@pg-slave ~]$ sudo runuser -l postgres -c "psql postgres://pg-slave:5432,pg-primary:5432/postgres?target_session_attrs=read-write"
psql (11.5 (Ubuntu 11.5-1.pgdg18.04+1))
Type "help" for help.
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-------------+----------+-----------+---------+-------+-----------------------
nexus | postgres | SQL_ASCII | C | C |
postgres | postgres | SQL_ASCII | C | C |
template0 | postgres | SQL_ASCII | C | C | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | SQL_ASCII | C | C | =c/postgres +
| | | | | postgres=CTc/postgres
testdb_1058 | postgres | SQL_ASCII | C | C |
(5 rows)
postgres=# create database testdb_1108;
CREATE DATABASE
postgres=# create table testtbl_1108 (serial int);
CREATE TABLE
postgres=# insert into testtbl_1108 values (999);
INSERT 0 1
postgres=# insert into testtbl_1108 values (888);
INSERT 0 1
postgres=# insert into testtbl_1108 values (77);
INSERT 0 1
postgres=# insert into testtbl_1108 values (66);
INSERT 0 1
postgres=# insert into testtbl_1108 values (55);
INSERT 0 1
postgres=# \q
2019-09-20 03:09:34 [administrator@pg-slave ~]$
很好,看起來沒有問題,這時候把 pg-primary 打開,執行啟動 pg_auto_failover node 的指令,看看會發生什麼事
2019-09-20 11:12:13 [administrator@pg-primary ~]$ sudo runuser -l postgres -c "pg_autoctl run"
11:12:29 INFO Managing PostgreSQL installation at "/var/lib/postgresql/11/main"
11:12:29 INFO The version of extenstion "pgautofailover" is "1.0" on the monitor
11:12:29 INFO pg_autoctl service is starting
11:12:29 INFO Calling node_active for node default/1/0 with current state: primary, PostgreSQL is not running, sync_state is "", WAL delta is -1.
11:12:29 INFO Postgres is not running, starting postgres
11:12:29 INFO /usr/lib/postgresql/11/bin/pg_ctl --pgdata /var/lib/postgresql/11/main --options "-p 5432" --options "-h *" --wait start
11:12:30 WARN PostgreSQL was not running, restarted with pid 1407
11:12:30 INFO FSM transition from "primary" to "demoted": A failover occurred, no longer primary
11:12:30 INFO Transition complete: current state is now "demoted"
11:12:30 INFO Calling node_active for node default/1/0 with current state: demoted, PostgreSQL is not running, sync_state is "", WAL delta is -1.
11:12:30 INFO FSM transition from "demoted" to "catchingup": A new primary is available. First, try to rewind. If that fails, do a pg_basebackup.
11:12:30 INFO The primary node returned by the monitor is pg-slave:5432
11:12:30 INFO Rewinding PostgreSQL to follow new primary pg-slave:5432
11:12:30 ERROR Connection to database failed: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
11:12:30 ERROR Failed to get the postgresql.conf path from the local postgres server, see above for details
11:12:30 WARN Failed to rewind demoted primary to standby, trying pg_basebackup instead
11:12:30 INFO Initialising PostgreSQL as a hot standby
11:12:30 INFO Target directory exists: "/var/lib/postgresql/11/main", stopping PostgreSQL
11:12:30 INFO pg_ctl: no server running
11:12:30 INFO pg_ctl stop failed, but PostgreSQL is not running anyway
11:12:30 INFO Running /usr/lib/postgresql/11/bin/pg_basebackup -w -h pg-slave -p 5432 --pgdata /var/lib/postgresql/11/backup -U pgautofailover_replicator --write-recovery-conf --max-rate 100M --wal-method=stream --slot pgautofailover_standby ...
11:12:33 INFO pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/12000028 on timeline 6
pg_basebackup: starting background WAL receiver
0/46937 kB (0%), 0/1 tablespace (...ostgresql/11/backup/backup_label)
46947/46947 kB (100%), 0/1 tablespace (...esql/11/backup/global/pg_control)
46947/46947 kB (100%), 1/1 tablespace
pg_basebackup: write-ahead log end point: 0/120000F8
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: base backup completed
11:12:33 INFO Postgres is not running, starting postgres
11:12:33 INFO /usr/lib/postgresql/11/bin/pg_ctl --pgdata /var/lib/postgresql/11/main --options "-p 5432" --options "-h *" --wait start
11:12:33 INFO PostgreSQL started on port 5432
11:12:33 INFO Drop replication slot "pgautofailover_standby"
11:12:33 INFO Transition complete: current state is now "catchingup"
11:12:33 INFO Calling node_active for node default/1/0 with current state: catchingup, PostgreSQL is running, sync_state is "", WAL delta is -1.
11:12:38 INFO Calling node_active for node default/1/0 with current state: catchingup, PostgreSQL is running, sync_state is "", WAL delta is -1.
11:12:43 INFO Calling node_active for node default/1/0 with current state: catchingup, PostgreSQL is running, sync_state is "", WAL delta is -1.
11:12:48 INFO Calling node_active for node default/1/0 with current state: catchingup, PostgreSQL is running, sync_state is "", WAL delta is -1.
11:12:48 INFO FSM transition from "catchingup" to "secondary": Convinced the monitor that I'm up and running, and eligible for promotion again
11:12:48 INFO Transition complete: current state is now "secondary"
11:12:48 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
11:12:53 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
11:12:58 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
11:13:03 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
可以看到 pg_auto_failover 偵測到了cluster node 有意異動,開始同步資料,然後切換角色,從primary變成了 secondary
同一時間,在pg-slave 的訊息也有變動
03:12:34 INFO Calling node_active for node default/4/0 with current state: wait_primary, PostgreSQL is running, sync_state is "async", WAL delta is 0.
03:12:39 INFO Calling node_active for node default/4/0 with current state: wait_primary, PostgreSQL is running, sync_state is "async", WAL delta is 0.
03:12:45 INFO Calling node_active for node default/4/0 with current state: wait_primary, PostgreSQL is running, sync_state is "async", WAL delta is 0.
03:12:50 INFO Calling node_active for node default/4/0 with current state: wait_primary, PostgreSQL is running, sync_state is "async", WAL delta is 0.
03:12:50 INFO FSM transition from "wait_primary" to "primary": A healthy secondary appeared
03:12:50 INFO Enabling synchronous replication
03:12:50 INFO Transition complete: current state is now "primary"
03:12:50 INFO Calling node_active for node default/4/0 with current state: primary, PostgreSQL is running, sync_state is "sync", WAL delta is 0.
03:12:55 INFO Calling node_active for node default/4/0 with current state: primary, PostgreSQL is running, sync_state is "sync", WAL delta is 0.
pg-slave 的狀態從原本的secondary 變成了 primary
來檢查看看
2019-09-20 03:19:04 [administrator@pg-slave ~]$ sudo runuser -l postgres -c "pg_autoctl show state"
Name | Port | Group | Node | Current State | Assigned State
-----------+--------+-------+-------+-------------------+------------------
pg-primary | 5432 | 0 | 1 | secondary | secondary
pg-slave | 5432 | 0 | 4 | primary | primary
2019-09-20 03:19:07 [administrator@pg-slave ~]$
跟上面第一次執行的結果不同了,兩台角色互換了,這也是預期中的結果
那在pg-master 離線期間的異動資料呢?在 pg-master上查得到嗎?
先透過 URI存取DB 檢查看看
2019-09-20 11:12:16 [administrator@pg-primary ~]$ sudo runuser -l postgres -c "psql postgres://pg-slave:5432,pg-primary:5432/postgres?target_session_attrs=read-write"
psql (11.5 (Ubuntu 11.5-1.pgdg18.04+1))
Type "help" for help.
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-------------+----------+-----------+---------+-------+-----------------------
nexus | postgres | SQL_ASCII | C | C |
postgres | postgres | SQL_ASCII | C | C |
template0 | postgres | SQL_ASCII | C | C | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | SQL_ASCII | C | C | =c/postgres +
| | | | | postgres=CTc/postgres
testdb_1058 | postgres | SQL_ASCII | C | C |
testdb_1108 | postgres | SQL_ASCII | C | C |
(6 rows)
postgres=# select * from testtbl_1108;
serial
--------
999
888
77
66
55
(5 rows)
postgres=#
GOOD!看來資料也同步過來了
再來看看單機的狀態
2019-09-20 11:22:07 [administrator@pg-primary ~]$ sudo runuser -l postgres -c "psql"
psql (11.5 (Ubuntu 11.5-1.pgdg18.04+1))
Type "help" for help.
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-------------+----------+-----------+---------+-------+-----------------------
nexus | postgres | SQL_ASCII | C | C |
postgres | postgres | SQL_ASCII | C | C |
template0 | postgres | SQL_ASCII | C | C | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | SQL_ASCII | C | C | =c/postgres +
| | | | | postgres=CTc/postgres
testdb_1058 | postgres | SQL_ASCII | C | C |
testdb_1108 | postgres | SQL_ASCII | C | C |
(6 rows)
postgres=# select * from testtbl_1108;
serial
--------
999
888
77
66
55
(5 rows)
postgres=#
很好,確認資料有正確的抄寫過來!
以上測試可以發現 pg_auto_failover 這個postgresql的extension 還真的很好用!
設定簡單、快速、方便,設定完成後也不用傷腦筋,反正有異常,會自動幫你搞定master-slave之間的主從關係
資料也都會自動同步好,真的是非常推薦啊!
UPDATE
將 pg_auto_failover 用 systemd 管理
每次都要手動執行 pg_autoctl run 太麻煩了,應該用systemd 或者是 supervisor 來管理
而pg_auto_failover 有指令可以做成 systemd 的service 檔案
既然人家都做好了,當然就直接用 systemd 來做
執行以下指令,產生 systemd configuration
sudo runuser -l postgres -c "pg_autoctl -q show systemd" | sudo tee /etc/systemd/system/pgautofailover.service
在/etc/systemd/system底下會多出一個 pgautofailover.service檔案
內容就是 pg_auto_failover 所提供的 systemd config
接下來就可以直接用 sudo service pgautofailover start 來啟動
沒有什麼問題,不過呢,這樣的作法,會在 /var/log/syslog 塞滿了 pg_auto_failover 的紀錄
Sep 20 14:59:34 pg-primary systemd[1]: Started pg_auto_failover.
Sep 20 14:59:34 pg-primary pg_autoctl[7817]: 14:59:34 INFO Managing PostgreSQL installation at "/var/lib/postgresql/11/main"
Sep 20 14:59:34 pg-primary pg_autoctl[7817]: 14:59:34 INFO Found a stale pidfile at "/tmp/pg_autoctl/var/lib/postgresql/11/main/pg_autoctl.pid"
Sep 20 14:59:34 pg-primary pg_autoctl[7817]: 14:59:34 WARN Removing the stale pid file "/tmp/pg_autoctl/var/lib/postgresql/11/main/pg_autoctl.pid"
Sep 20 14:59:34 pg-primary pg_autoctl[7817]: 14:59:34 INFO The version of extenstion "pgautofailover" is "1.0" on the monitor
Sep 20 14:59:34 pg-primary pg_autoctl[7817]: 14:59:34 INFO pg_autoctl service is starting
Sep 20 14:59:34 pg-primary pg_autoctl[7817]: 14:59:34 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 14:59:39 pg-primary pg_autoctl[7817]: 14:59:39 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 14:59:44 pg-primary pg_autoctl[7817]: 14:59:44 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 14:59:49 pg-primary pg_autoctl[7817]: 14:59:49 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 14:59:54 pg-primary pg_autoctl[7817]: 14:59:54 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 14:59:59 pg-primary pg_autoctl[7817]: 14:59:59 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:00:04 pg-primary pg_autoctl[7817]: 15:00:04 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:00:09 pg-primary pg_autoctl[7817]: 15:00:09 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:00:15 pg-primary pg_autoctl[7817]: 15:00:14 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:00:20 pg-primary pg_autoctl[7817]: 15:00:20 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:00:25 pg-primary pg_autoctl[7817]: 15:00:25 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:00:30 pg-primary pg_autoctl[7817]: 15:00:30 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:00:35 pg-primary pg_autoctl[7817]: 15:00:35 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:00:40 pg-primary pg_autoctl[7817]: 15:00:40 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:00:45 pg-primary pg_autoctl[7817]: 15:00:45 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:00:50 pg-primary pg_autoctl[7817]: 15:00:50 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:00:55 pg-primary pg_autoctl[7817]: 15:00:55 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:01:00 pg-primary pg_autoctl[7817]: 15:01:00 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:01:05 pg-primary pg_autoctl[7817]: 15:01:05 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:01:10 pg-primary pg_autoctl[7817]: 15:01:10 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:01:15 pg-primary pg_autoctl[7817]: 15:01:15 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:01:20 pg-primary pg_autoctl[7817]: 15:01:20 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:01:25 pg-primary pg_autoctl[7817]: 15:01:25 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:01:30 pg-primary pg_autoctl[7817]: 15:01:30 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:01:35 pg-primary pg_autoctl[7817]: 15:01:35 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:01:40 pg-primary pg_autoctl[7817]: 15:01:40 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:01:45 pg-primary pg_autoctl[7817]: 15:01:45 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:01:50 pg-primary pg_autoctl[7817]: 15:01:50 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:01:55 pg-primary pg_autoctl[7817]: 15:01:55 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:02:00 pg-primary pg_autoctl[7817]: 15:02:00 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:02:05 pg-primary pg_autoctl[7817]: 15:02:05 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:02:10 pg-primary pg_autoctl[7817]: 15:02:10 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:02:15 pg-primary pg_autoctl[7817]: 15:02:15 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:02:20 pg-primary pg_autoctl[7817]: 15:02:20 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:02:25 pg-primary pg_autoctl[7817]: 15:02:25 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:02:30 pg-primary pg_autoctl[7817]: 15:02:30 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:02:35 pg-primary pg_autoctl[7817]: 15:02:35 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:02:40 pg-primary pg_autoctl[7817]: 15:02:40 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:02:46 pg-primary pg_autoctl[7817]: 15:02:46 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:02:51 pg-primary pg_autoctl[7817]: 15:02:51 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:02:56 pg-primary pg_autoctl[7817]: 15:02:56 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:03:01 pg-primary pg_autoctl[7817]: 15:03:01 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:03:06 pg-primary pg_autoctl[7817]: 15:03:06 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:03:11 pg-primary pg_autoctl[7817]: 15:03:11 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:03:16 pg-primary pg_autoctl[7817]: 15:03:16 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:03:21 pg-primary pg_autoctl[7817]: 15:03:21 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:03:26 pg-primary pg_autoctl[7817]: 15:03:26 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:03:31 pg-primary pg_autoctl[7817]: 15:03:31 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:03:36 pg-primary pg_autoctl[7817]: 15:03:36 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:03:41 pg-primary pg_autoctl[7817]: 15:03:41 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:03:46 pg-primary pg_autoctl[7817]: 15:03:46 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:03:51 pg-primary pg_autoctl[7817]: 15:03:51 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:03:57 pg-primary pg_autoctl[7817]: 15:03:57 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:04:02 pg-primary pg_autoctl[7817]: 15:04:02 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:04:07 pg-primary pg_autoctl[7817]: 15:04:07 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:04:12 pg-primary pg_autoctl[7817]: 15:04:12 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:04:17 pg-primary pg_autoctl[7817]: 15:04:17 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:04:22 pg-primary pg_autoctl[7817]: 15:04:22 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:04:27 pg-primary pg_autoctl[7817]: 15:04:27 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:04:32 pg-primary pg_autoctl[7817]: 15:04:32 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:04:37 pg-primary pg_autoctl[7817]: 15:04:37 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:04:42 pg-primary pg_autoctl[7817]: 15:04:42 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:04:47 pg-primary pg_autoctl[7817]: 15:04:47 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
Sep 20 15:04:52 pg-primary pg_autoctl[7817]: 15:04:52 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
這樣下去不用多久,syslog 一定爆,而且嚴重干擾到其他的系統訊息
所以要修改一下 config ,把log 寫到另外的檔案去
在/etc/systemd/system/pgautofailover.service 加入底下兩行
StandardOutput=file:/var/log/pgautofailover.log
StandardError=file:/var/log/pgautofailover-error.log
然後 sudo systemctl daemon-reload 接著重起 sudo service pgautofailover restart
就會把訊息都寫到 /var/log/pgautofailover.log , /var/log/pgautofailover-error.log
再用 logrotate 來管理就可以了
這個動作需要在三台node都執行
不過我在想應該是可以不要產生那麼多log ,只要紀錄 critical 就好了?來試試看加入 LogLevelMax=3 的效果
在 /etc/systemd/system/pgautofailover.service 檔案中的service 區段 加入
LogLevelMax=3
loglevel 的等級定義看這邊
https://www.ctrl.blog/entry/systemd-log-levels.html
- emergency (0)
- alert (1)
- critical (2)
- error (3)
- warning (4)
- notice (5)
- info (6)
- debug (7)
完成後,一樣執行
sudo systemctl daemon-reload
sudo service pgautofailover restart
看一下檔案內容,怎麼就沒有訊息了!
2019-09-20 15:23:15 [administrator@pg-primary ~]$ sudo tail -30 /var/log/pgautofailover.log
15:14:22 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
15:14:27 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
15:14:32 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
15:14:37 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
15:14:42 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
15:14:47 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
15:14:52 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
15:14:57 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
15:15:02 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
15:15:07 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
15:15:12 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
15:15:17 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
15:15:22 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
15:15:27 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
15:15:32 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
15:15:37 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
15:15:42 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
15:15:47 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
15:15:52 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
15:15:58 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
15:16:03 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
15:16:08 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
15:16:13 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
15:16:18 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
15:16:23 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
15:16:28 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
15:16:34 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
15:16:39 INFO Calling node_active for node default/1/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
15:16:39 WARN Smart shutdown: received signal Terminated
15:16:39 INFO pg_autoctl service stopping
2019-09-20 15:23:29 [administrator@pg-primary ~]$
可是我的服務活著啊!
2019-09-20 15:23:29 [administrator@pg-primary ~]$ sudo service pgautofailover status
● pgautofailover.service - pg_auto_failover
Loaded: loaded (/etc/systemd/system/pgautofailover.service; disabled; vendor preset: enabled)
Active: active (running) since Fri 2019-09-20 15:16:39 CST; 7min ago
Main PID: 8611 (pg_autoctl)
Tasks: 1 (limit: 2321)
CGroup: /system.slice/pgautofailover.service
└─8611 /usr/bin/pg_autoctl run
Sep 20 15:16:39 pg-primary systemd[1]: Started pg_auto_failover.
2019-09-20 15:23:56 [administrator@pg-primary ~]$
拿另一台來做實驗,先不要加入 loglevelmax=3 ,看看log到底長怎樣!
2019-09-20 07:25:34 [administrator@pg-slave ~]$ sudo service pgautofailover stop
2019-09-20 07:25:58 [administrator@pg-slave ~]$ sudo tail -10 /var/log/syslog
Sep 20 07:25:32 pg-slave pg_autoctl[5840]: 07:25:32 INFO Calling node_active for node default/4/0 with current state: primary, PostgreSQL is running, sync_state is "sync", WAL delta is 0.
Sep 20 07:25:37 pg-slave pg_autoctl[5840]: 07:25:37 INFO Calling node_active for node default/4/0 with current state: primary, PostgreSQL is running, sync_state is "sync", WAL delta is 0.
Sep 20 07:25:42 pg-slave pg_autoctl[5840]: 07:25:42 INFO Calling node_active for node default/4/0 with current state: primary, PostgreSQL is running, sync_state is "sync", WAL delta is 0.
Sep 20 07:25:47 pg-slave pg_autoctl[5840]: 07:25:47 INFO Calling node_active for node default/4/0 with current state: primary, PostgreSQL is running, sync_state is "sync", WAL delta is 0.
Sep 20 07:25:52 pg-slave pg_autoctl[5840]: 07:25:52 INFO Calling node_active for node default/4/0 with current state: primary, PostgreSQL is running, sync_state is "sync", WAL delta is 0.
Sep 20 07:25:57 pg-slave pg_autoctl[5840]: 07:25:57 INFO Calling node_active for node default/4/0 with current state: primary, PostgreSQL is running, sync_state is "sync", WAL delta is 0.
Sep 20 07:25:58 pg-slave systemd[1]: Stopping pg_auto_failover...
Sep 20 07:25:58 pg-slave pg_autoctl[5840]: 07:25:58 WARN Smart shutdown: received signal Terminated
Sep 20 07:25:58 pg-slave pg_autoctl[5840]: 07:25:58 INFO pg_autoctl service stopping
Sep 20 07:25:58 pg-slave systemd[1]: Stopped pg_auto_failover.
好吧,看不出有什麼差別! 直接把primary 關機測試! 結果還是看不出來 0rz
放棄加入loglevel 的想法,就用logrotate 來管理吧!
UPDATE
用systemd 來管理的作法失敗了,重起之後, pgautofailover 不會自動啟動...
找了很久找不出原因,改用 supervisor來做好了
設定 supervisor
安裝supervisor
sudo apt install supervisor
sudo vim /etc/supervisor/conf.d/pgautofailover.conf
加入以下內容
[program:pgautofailover]
command = pg_autoctl run
user = postgres
autostart=true
autorestart=true
redirect_stderr = true
environment=PGDATA=/var/lib/postgresql/11/main,HOME=/var/lib/postgresql
重新啟動 supervisor & 檢查狀態
2019-09-20 17:26:22 [administrator@pg-primary postgresql]$ sudo service supervisor restart
2019-09-20 17:26:57 [administrator@pg-primary postgresql]$ sudo supervisorctl status
pgautofailover RUNNING pid 14554, uptime 0:00:05
2019-09-20 17:27:03 [administrator@pg-primary postgresql]$
檢查 supervisor log
2019-09-20 17:27:03 [administrator@pg-primary postgresql]$ sudo tail -30 /var/log/supervisor/pgautofailover-stdout---supervisor-5V8qET.log
17:26:58 INFO Managing PostgreSQL installation at "/var/lib/postgresql/11/main"
17:26:58 INFO Found a stale pidfile at "/tmp/pg_autoctl/var/lib/postgresql/11/main/pg_autoctl.pid"
17:26:58 WARN Removing the stale pid file "/tmp/pg_autoctl/var/lib/postgresql/11/main/pg_autoctl.pid"
17:26:58 INFO The version of extenstion "pgautofailover" is "1.0" on the monitor
17:26:58 INFO pg_autoctl service is starting
17:26:58 INFO Calling node_active for node default/1/0 with current state: primary, PostgreSQL is running, sync_state is "sync", WAL delta is 0.
17:27:03 INFO Calling node_active for node default/1/0 with current state: primary, PostgreSQL is running, sync_state is "sync", WAL delta is 0.
17:27:08 INFO Calling node_active for node default/1/0 with current state: primary, PostgreSQL is running, sync_state is "sync", WAL delta is 0.
17:27:13 INFO Calling node_active for node default/1/0 with current state: primary, PostgreSQL is running, sync_state is "sync", WAL delta is 0.
17:27:18 INFO Calling node_active for node default/1/0 with current state: primary, PostgreSQL is running, sync_state is "sync", WAL delta is 0.
2019-09-20 17:27:21 [administrator@pg-primary postgresql]$
簡單多了 ... 在剩下的monitor/slave 也一樣操作安裝、設定supervisor 就可以了
喔,順便把剛剛新增的 systemd config 給砍了..
不過不砍也沒差,反正開機也不會自己啟動 = =+
2019-09-20 17:24:17 [administrator@pg-slave ~]$ sudo rm -rf /etc/systemd/system/pgautofailover.service
#安裝supervisor
2019-09-20 17:23:08 [administrator@pg-slave ~]$ sudo apt install supervisor
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
python-meld3 python-pkg-resources
Suggested packages:
python-setuptools supervisor-doc
The following NEW packages will be installed:
python-meld3 python-pkg-resources supervisor
0 upgraded, 3 newly installed, 0 to remove and 167 not upgraded.
Need to get 415 kB of archives.
After this operation, 2,138 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://archive.ubuntu.com/ubuntu bionic/main amd64 python-pkg-resources all 39.0.1-2 [128 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic/universe amd64 python-meld3 all 1.0.2-2 [30.9 kB]
Get:3 http://archive.ubuntu.com/ubuntu bionic/universe amd64 supervisor all 3.3.1-1.1 [256 kB]
Fetched 415 kB in 0s (6,752 kB/s)
Selecting previously unselected package python-pkg-resources.
(Reading database ... 103586 files and directories currently installed.)
Preparing to unpack .../python-pkg-resources_39.0.1-2_all.deb ...
Unpacking python-pkg-resources (39.0.1-2) ...
Selecting previously unselected package python-meld3.
Preparing to unpack .../python-meld3_1.0.2-2_all.deb ...
Unpacking python-meld3 (1.0.2-2) ...
Selecting previously unselected package supervisor.
Preparing to unpack .../supervisor_3.3.1-1.1_all.deb ...
Unpacking supervisor (3.3.1-1.1) ...
Processing triggers for ureadahead (0.100.0-20) ...
Setting up python-meld3 (1.0.2-2) ...
Setting up python-pkg-resources (39.0.1-2) ...
Setting up supervisor (3.3.1-1.1) ...
Created symlink /etc/systemd/system/multi-user.target.wants/supervisor.service → /lib/systemd/system/supervisor.service.
Processing triggers for systemd (237-3ubuntu10.29) ...
Processing triggers for man-db (2.8.3-2) ...
Processing triggers for ureadahead (0.100.0-20) ...
2019-09-20 17:23:37 [administrator@pg-slave ~]$ sudo vim /etc/supervisor/conf.d/pgautofailover.conf
#貼上上面的 conf 內容
2019-09-20 17:30:12 [administrator@pg-slave ~]$ sudo service supervisor restart
2019-09-20 17:30:58 [administrator@pg-slave ~]$ sudo supervisorctl status
pgautofailover RUNNING pid 2232, uptime 0:00:04
2019-09-20 17:31:03 [administrator@pg-slave ~]$ sudo tail -30 /var/log/supervisor/pgautofailover-stdout---supervisor-LC_FG8.log
17:30:59 INFO Managing PostgreSQL installation at "/var/lib/postgresql/11/main"
17:30:59 INFO Found a stale pidfile at "/tmp/pg_autoctl/var/lib/postgresql/11/main/pg_autoctl.pid"
17:30:59 WARN Removing the stale pid file "/tmp/pg_autoctl/var/lib/postgresql/11/main/pg_autoctl.pid"
17:30:59 INFO The version of extenstion "pgautofailover" is "1.0" on the monitor
17:30:59 INFO pg_autoctl service is starting
17:30:59 INFO Calling node_active for node default/4/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 320.
17:31:04 INFO Calling node_active for node default/4/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 320.
17:31:09 INFO Calling node_active for node default/4/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 320.
2019-09-20 17:31:13 [administrator@pg-slave ~]$
搞定,收工!準備寫 ansible playbook!
UPDATE
新增用logrotate 來管理 pgautofailover logfile
sudo vim /etc/logrotate.d/pgautofailover
/var/log/supervisor/pgautofail*.log {
daily
rotate 7
copytruncate
delaycompress # today and yesterday will not compress
compress
missingok
notifempty
}
UPDATE
當node 消失,需要手動在monitor上執行以下指令來移除node ,才能夠再次加入新的node
2019-09-24 01:11:09 [administrator@monitor ~]$ sudo runuser -l postgres -c "psql postgres://autoctl_node@monitor:5432/pg_auto_failover"
psql (11.5 (Ubuntu 11.5-1.pgdg18.04+1))
Type "help" for help.
pg_auto_failover=> select pgautofailover.remove_node('pg-third')
pg_auto_failover-> ;
remove_node
-------------
t
(1 row)
pg_auto_failover=>