PgSQL · 应用案例 · PostgreSQL 时间线修复__背景 1、PG物理流复制的从库，当激活后，可以开启读写，使用pg_rewind可以将从库回退为只读从库的角色。而不需要重建整个从库。 2、当异步主从发生角色切换后，主库的wal目录中可能还有没完全同步到从库的内容，因此老的主库无法直接切换为新主库的从库。使用pg_rewind可以修复老的主库，使之成为新主库的只读从库。而不需要重建整个从库。 3、如果没有pg_rewind，遇到以上情况，需要完全重建从库。或者你可以使用存储层快照，回退回脑裂以前的状态。又或者可以使用文件系统快照，回退回脑裂以前的状态。_

PgSQL · 应用案例 · PostgreSQL 时间线修复

原作者：阿里云内核开发组　　创作时间：2021-03-09 20:36:04+08

wangliyun 发布于2021-03-10 08:00:00　　

评论: 1　　

顶: 598　

踩: 653　

背景

1、PG物理流复制的从库，当激活后，可以开启读写，使用pg_rewind可以将从库回退为只读从库的角色。而不需要重建整个从库。

2、当异步主从发生角色切换后，主库的wal目录中可能还有没完全同步到从库的内容，因此老的主库无法直接切换为新主库的从库。使用pg_rewind可以修复老的主库，使之成为新主库的只读从库。而不需要重建整个从库。

3、如果没有pg_rewind，遇到以上情况，需要完全重建从库。或者你可以使用存储层快照，回退回脑裂以前的状态。又或者可以使用文件系统快照，回退回脑裂以前的状态。

原理与修复步骤

1、使用pg_rewind功能的前提条件：必须开启full page write，必须开启wal hint或者data block checksum。

2、需要被修复的库：从激活点开始，所有的WAL必须存在pgwal目录中。如果WAL已经被覆盖，只要有归档，拷贝到pgwal目录即可。

3、新的主库，从激活点开始，产生的所有WAL必须存在pgwal目录中，或者已归档，并且被修复的库可以使用restorecommand访问到这部分WAL。

4、修改(source db)新主库或老主库配置，允许连接。

5、修复时，连接新主库，得到切换点。或连接老主库，同时比对当前要修复的新主库的TL与老主库进行比对，得到切换点。

6、解析需要被修复的库的从切换点到现在所有的WAL。同时连接source db(新主库（或老主库）)，进行回退操作（被修改或删除的BLOCK从source db获取并覆盖，新增的BLOCK，直接抹除。）回退到切换点的状态。

7、修改被修复库(target db)的recovery.conf, postgresql.conf配置。

8、启动target db，连接source db接收WAL，或restore_command配置接收WAL，从切换点开始所有WAL，进行apply。

9、target db现在是source db的从库。

以EDB PG 11为例讲解

环境部署

《MTK使用 - PG,PPAS,oracle,mysql,ms sql,sybase 迁移到 PG, PPAS (支持跨版本升级)》

（https://github.com/digoal/blog/blob/master/201812/20181226_01.md）

export PS1="$USER@`/bin/hostname -s`-> "      
export PGPORT=4000  
export PGDATA=/data04/ppas11/pg_root4000  
export LANG=en_US.utf8      
export PGHOME=/usr/edb/as11  
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH      
export DATE=`date +"%Y%m%d%H%M"`    
export PATH=$PGHOME/bin:$PATH:.      
export MANPATH=$PGHOME/share/man:$MANPATH      
export PGHOST=127.0.0.1  
export PGUSER=postgres   
export PGDATABASE=postgres      
alias rm='rm -i'      
alias ll='ls -lh'      
unalias vi

1、初始化数据库集群

initdb -D /data04/ppas11/pg_root4000 -E UTF8 --lc-collate=C --lc-ctype=en_US.UTF8 -U postgres -k --redwood-like

2、配置recovery.done

cd $PGDATA  

cp $PGHOME/share/recovery.conf.sample ./  

mv recovery.conf.sample recovery.done  

vi recovery.done  

restore_command = 'cp /data04/ppas11/wal/%f %p'  
recovery_target_timeline = 'latest'  
standby_mode = on  
primary_conninfo = 'host=localhost port=4000 user=postgres'

3、配置postgresql.conf

要使用rewind功能：

必须开启fullpagewrites

必须开启datachecksums或wallog_hints

postgresql.conf  

listen_addresses = '0.0.0.0'  
port = 4000  
max_connections = 8000  
superuser_reserved_connections = 13  
unix_socket_directories = '.,/tmp'  
unix_socket_permissions = 0700  
tcp_keepalives_idle = 60  
tcp_keepalives_interval = 10  
tcp_keepalives_count = 10  
shared_buffers = 16GB  
max_prepared_transactions = 8000  
maintenance_work_mem = 1GB  
autovacuum_work_mem = 1GB  
dynamic_shared_memory_type = posix  
vacuum_cost_delay = 0  
bgwriter_delay = 10ms  
bgwriter_lru_maxpages = 1000  
bgwriter_lru_multiplier = 10.0  
effective_io_concurrency = 0  
max_worker_processes = 128  
max_parallel_maintenance_workers = 8  
max_parallel_workers_per_gather = 8  
max_parallel_workers = 24  
wal_level = replica  
synchronous_commit = off  
full_page_writes = on  
wal_compression = on  
wal_buffers = 32MB  
wal_writer_delay = 10ms  
checkpoint_timeout = 25min  
max_wal_size = 32GB  
min_wal_size = 8GB  
checkpoint_completion_target = 0.2  
archive_mode = on  
archive_command = 'cp -n %p /data04/ppas11/wal/%f'  
max_wal_senders = 16  
wal_keep_segments = 4096  
max_replication_slots = 16  
hot_standby = on  
max_standby_archive_delay = 300s  
max_standby_streaming_delay = 300s  
wal_receiver_status_interval = 1s  
wal_receiver_timeout = 10s  
random_page_cost = 1.1  
effective_cache_size = 400GB  
log_destination = 'csvlog'  
logging_collector = on  
log_directory = 'log'  
log_filename = 'edb-%a.log'  
log_truncate_on_rotation = on  
log_rotation_age = 1d  
log_rotation_size = 0  
log_min_duration_statement = 1s  
log_checkpoints = on  
log_error_verbosity = verbose  
log_line_prefix = '%t '  
log_lock_waits = on  
log_statement = 'ddl'  
log_timezone = 'PRC'  
autovacuum = on  
log_autovacuum_min_duration = 0  
autovacuum_max_workers = 6  
autovacuum_freeze_max_age = 1200000000  
autovacuum_multixact_freeze_max_age = 1400000000  
autovacuum_vacuum_cost_delay = 0  
statement_timeout = 0  
lock_timeout = 0  
idle_in_transaction_session_timeout = 0  
vacuum_freeze_table_age = 1150000000  
vacuum_multixact_freeze_table_age = 1150000000  
datestyle = 'redwood,show_time'  
timezone = 'PRC'  
lc_messages = 'en_US.utf8'  
lc_monetary = 'en_US.utf8'  
lc_numeric = 'en_US.utf8'  
lc_time = 'en_US.utf8'  
default_text_search_config = 'pg_catalog.english'  
shared_preload_libraries = 'auto_explain,pg_stat_statements,$libdir/dbms_pipe,$libdir/edb_gen,$libdir/dbms_aq'  
edb_redwood_date = on  
edb_redwood_greatest_least = on  
edb_redwood_strings = on  
db_dialect = 'redwood'                
edb_dynatune = 66  
edb_dynatune_profile = oltp  
timed_statistics = off

4、配置pg_hba.conf，允许流复制

local   all             all                                     trust  
host    all             all             127.0.0.1/32            trust  
host    all             all             ::1/128                 trust  
local   replication     all                                     trust  
host    replication     all             127.0.0.1/32            trust  
host    replication     all             ::1/128                 trust  
host all all 0.0.0.0/0 md5

5、配置归档目录

mkdir /data04/ppas11/wal    
chown enterprisedb:enterprisedb /data04/ppas11/wal

6、创建从库

pg_basebackup -h 127.0.0.1 -p 4000 -D /data04/ppas11/pg_root4001 -F p -c fast

7、配置从库

cd /data04/ppas11/pg_root4001  

mv recovery.done recovery.conf  
vi postgresql.conf  

port = 4001

8、启动从库

pg_ctl start -D /data04/ppas11/pg_root4001

9、压测主库

pgbench -i -s 1000  

pgbench -M prepared -v -r -P 1 -c 24 -j 24 -T 300

10、检查归档

postgres=# select * from pg_stat_archiver ;  
 archived_count |    last_archived_wal     |        last_archived_time        | failed_count | last_failed_wal | last_failed_time |           stats_reset              
----------------+--------------------------+----------------------------------+--------------+-----------------+------------------+----------------------------------  
            240 | 0000000100000000000000F0 | 28-JAN-19 15:08:43.276965 +08:00 |            0 |                 |                  | 28-JAN-19 15:01:17.883338 +08:00  
(1 row)  

postgres=# select * from pg_stat_archiver ;  
 archived_count |    last_archived_wal     |        last_archived_time        | failed_count | last_failed_wal | last_failed_time |           stats_reset              
----------------+--------------------------+----------------------------------+--------------+-----------------+------------------+----------------------------------  
            248 | 0000000100000000000000F8 | 28-JAN-19 15:08:45.120134 +08:00 |            0 |                 |                  | 28-JAN-19 15:01:17.883338 +08:00  
(1 row)

11、检查从库延迟

postgres=# select * from pg_stat_replication ;  
-[ RECORD 1 ]----+---------------------------------  
pid              | 8124  
usesysid         | 10  
usename          | postgres  
application_name | walreceiver  
client_addr      | 127.0.0.1  
client_hostname  |   
client_port      | 62988  
backend_start    | 28-JAN-19 15:07:34.084542 +08:00  
backend_xmin     |   
state            | streaming  
sent_lsn         | 1/88BC2000  
write_lsn        | 1/88BC2000  
flush_lsn        | 1/88BC2000  
replay_lsn       | 1/88077D48  
write_lag        | 00:00:00.001417  
flush_lag        | 00:00:00.002221  
replay_lag       | 00:00:00.097657  
sync_priority    | 0  
sync_state       | async

例子1，从库激活后产生读写，使用pg_rewind修复从库，回退到只读从库

1、激活从库

pg_ctl promote -D /data04/ppas11/pg_root4001

2、写从库

pgbench -M prepared -v -r -P 1 -c 4 -j 4 -T 120 -p 4001

此时从库已经和主库不在一个时间线，无法直接变成当前主库的从库

enterprisedb@pg11-test-> pg_controldata -D /data04/ppas11/pg_root4001|grep -i time  
Latest checkpoint's TimeLineID:       1  
Latest checkpoint's PrevTimeLineID:   1  
Time of latest checkpoint:            Mon 28 Jan 2019 03:56:38 PM CST  
Min recovery ending loc's timeline:   2  
track_commit_timestamp setting:       off  
Date/time type storage:               64-bit integers  

enterprisedb@pg11-test-> pg_controldata -D /data04/ppas11/pg_root4000|grep -i time  
Latest checkpoint's TimeLineID:       1  
Latest checkpoint's PrevTimeLineID:   1  
Time of latest checkpoint:            Mon 28 Jan 2019 05:11:38 PM CST  
Min recovery ending loc's timeline:   0  
track_commit_timestamp setting:       off  
Date/time type storage:               64-bit integers

3、修复从库，使之继续成为当前主库的从库

4、查看切换点

cd /data04/ppas11/pg_root4001  

ll pg_wal/*.history  
-rw------- 1 enterprisedb enterprisedb 42 Jan 28 17:15 pg_wal/00000002.history  

cat pg_wal/00000002.history  
1       6/48C62000      no recovery target specified

5、从库激活时间开始产生的WAL必须全部在pg_wal目录中。

-rw------- 1 enterprisedb enterprisedb 42 Jan 28 17:15 00000002.history

-rw------- 1 enterprisedb enterprisedb 16M Jan 28 17:16 000000020000000600000048

…………

000000020000000600000048开始，所有的wal必须存在从库pgwal目录中。如果已经覆盖了，必须从归档目录拷贝到从库pgwal目录中。

6、从库激活时，主库从这个时间点开始所有的WAL还在pgwal目录，或者从库可以使用restorecommand获得(recovery.conf)。

recovery.conf  

restore_command = 'cp /data04/ppas11/wal/%f %p'

7、pg_rewind命令帮助

https://www.postgresql.org/docs/11/app-pgrewind.html
pg_rewind --help  
pg_rewind resynchronizes a PostgreSQL cluster with another copy of the cluster.  

Usage:  
  pg_rewind [OPTION]...  

Options:  
  -D, --target-pgdata=DIRECTORY  existing data directory to modify  
      --source-pgdata=DIRECTORY  source data directory to synchronize with  
      --source-server=CONNSTR    source server to synchronize with  
  -n, --dry-run                  stop before modifying anything  
  -P, --progress                 write progress messages  
      --debug                    write a lot of debug messages  
  -V, --version                  output version information, then exit  
  -?, --help                     show this help, then exit  

Report bugs to <support@enterprisedb.com>.

8、停库（被修复的库，停库）

pg_ctl stop -m fast -D /data04/ppas11/pg_root4001

9、尝试修复

pg_rewind -n -D /data04/ppas11/pg_root4001 --source-server="hostaddr=127.0.0.1 user=postgres port=4000"  

servers diverged at WAL location 6/48C62000 on timeline 1  
rewinding from last common checkpoint at 5/5A8CD30 on timeline 1  
Done!

10、尝试正常，说明可以修复，实施修复

pg_rewind -D /data04/ppas11/pg_root4001 --source-server="hostaddr=127.0.0.1 user=postgres port=4000"  

servers diverged at WAL location 6/48C62000 on timeline 1  
rewinding from last common checkpoint at 5/5A8CD30 on timeline 1  
Done!

11、已修复，改配置

cd /data04/ppas11/pg_root4001  

vi postgresql.conf  
port = 4001  
mv recovery.done recovery.conf  

vi recovery.conf  

restore_command = 'cp /data04/ppas11/wal/%f %p'   
recovery_target_timeline = 'latest'   
standby_mode = on   
primary_conninfo = 'host=localhost port=4000 user=postgres'

12、删除归档中错误时间线上产生的文件否则会在启动修复后的从库后，走到00000002时间线上，这是不想看到的。

mkdir /data04/ppas11/wal/error_tl_2  

mv /data04/ppas11/wal/00000002* /data04/ppas11/wal/error_tl_2

13、启动从库

pg_ctl start -D /data04/ppas11/pg_root4001

14、建议对主库做一个检查点，从库收到检查点后，重启后不需要应用太多WAL，而是从新检查点开始恢复

psql  
checkpoint;

15、压测主库

pgbench -M prepared -v -r -P 1 -c 16 -j 16 -T 200 -p 4000

16、查看归档状态

postgres=# select * from pg_stat_archiver ;  
 archived_count |    last_archived_wal     |        last_archived_time        | failed_count | last_failed_wal | last_failed_time |           stats_reset              
----------------+--------------------------+----------------------------------+--------------+-----------------+------------------+----------------------------------  
           1756 | 0000000100000006000000DC | 28-JAN-19 17:41:57.562425 +08:00 |            0 |                 |                  | 28-JAN-19 15:01:17.883338 +08:00  
(1 row)

17、查看从库健康、延迟，观察修复后的情况

postgres=# select * from pg_stat_replication ;  
-[ RECORD 1 ]----+--------------------------------  
pid              | 13179  
usesysid         | 10  
usename          | postgres  
application_name | walreceiver  
client_addr      | 127.0.0.1  
client_hostname  |   
client_port      | 63198  
backend_start    | 28-JAN-19 17:47:29.85308 +08:00  
backend_xmin     |   
state            | catchup  
sent_lsn         | 7/DDE80000  
write_lsn        | 7/DC000000  
flush_lsn        | 7/DC000000  
replay_lsn       | 7/26A8DCB0  
write_lag        | 00:00:18.373263  
flush_lag        | 00:00:18.373263  
replay_lag       | 00:00:18.373263  
sync_priority    | 0  
sync_state       | async

例子2，从库激活成为新主库后，老主库依旧有读写，使用pg_rewind修复老主库，将老主库降级为新主库的从库

1、激活从库

pg_ctl promote -D /data04/ppas11/pg_root4001

2、写从库

pgbench -M prepared -v -r -P 1 -c 16 -j 16 -T 200 -p 4001

3、写主库

pgbench -M prepared -v -r -P 1 -c 16 -j 16 -T 200 -p 4000

此时老主库已经和新的主库不在一个时间线

enterprisedb@pg11-test-> pg_controldata -D /data04/ppas11/pg_root4000|grep -i timeline  
Latest checkpoint's TimeLineID:       1  
Latest checkpoint's PrevTimeLineID:   1  
Min recovery ending loc's timeline:   0  
enterprisedb@pg11-test-> pg_controldata -D /data04/ppas11/pg_root4001|grep -i timeline  
Latest checkpoint's TimeLineID:       1  
Latest checkpoint's PrevTimeLineID:   1  
Min recovery ending loc's timeline:   2  


enterprisedb@pg11-test-> cd /data04/ppas11/pg_root4001/pg_wal  
enterprisedb@pg11-test-> cat 00000002.history   
1       8/48DE2318      no recovery target specified  

enterprisedb@pg11-test-> ll *.partial  
-rw------- 1 enterprisedb enterprisedb 16M Jan 28 17:48 000000010000000800000048.partial

4、修复老主库，变成从库

4.1、从库激活时，老主库从这个时间点开始所有的WAL，必须全部在pg_wal目录中。

000000010000000800000048 开始的所有WAL必须存在pgwal，如果已经覆盖了，必须从WAL归档拷贝到pgwal目录

4.2、从库激活时间开始产生的所有WAL，老主库必须可以使用restore_command获得(recovery.conf)。

recovery.conf  

restore_command = 'cp /data04/ppas11/wal/%f %p'

5、关闭老主库

pg_ctl stop -m fast -D /data04/ppas11/pg_root4000

6、尝试修复老主库

pg_rewind -n -D /data04/ppas11/pg_root4000 --source-server="hostaddr=127.0.0.1 user=postgres port=4001"  

servers diverged at WAL location 8/48DE2318 on timeline 1  
rewinding from last common checkpoint at 6/CCCEF770 on timeline 1  
Done!

7、尝试成功，可以修复，实施修复

pg_rewind -D /data04/ppas11/pg_root4000 --source-server="hostaddr=127.0.0.1 user=postgres port=4001"

8、修复完成后，改配置

cd /data04/ppas11/pg_root4000  

vi postgresql.conf  
port = 4000  
mv recovery.done recovery.conf  

vi recovery.conf  

restore_command = 'cp /data04/ppas11/wal/%f %p'   
recovery_target_timeline = 'latest'   
standby_mode = on   
primary_conninfo = 'host=localhost port=4001 user=postgres'

9、启动老主库

pg_ctl start -D /data04/ppas11/pg_root4000

10、建议对新主库做一个检查点，从库收到检查点后，重启后不需要应用太多WAL，而是从新检查点开始恢复

checkpoint;

11、压测新主库

pgbench -M prepared -v -r -P 1 -c 16 -j 16 -T 200 -p 4001

12、查看归档状态

psql -p 4001  


postgres=# select * from pg_stat_archiver ;  
 archived_count |    last_archived_wal     |        last_archived_time        | failed_count | last_failed_wal | last_failed_time |           stats_reset              
----------------+--------------------------+----------------------------------+--------------+-----------------+------------------+----------------------------------  
            406 | 0000000200000009000000DB | 28-JAN-19 21:18:22.976118 +08:00 |            0 |                 |                  | 28-JAN-19 17:47:29.847488 +08:00  
(1 row)

13、查看从库健康、延迟

psql -p 4001  

postgres=# select * from pg_stat_replication ;  
-[ RECORD 1 ]----+---------------------------------  
pid              | 17675  
usesysid         | 10  
usename          | postgres  
application_name | walreceiver  
client_addr      | 127.0.0.1  
client_hostname  |   
client_port      | 60530  
backend_start    | 28-JAN-19 21:18:36.472197 +08:00  
backend_xmin     |   
state            | streaming  
sent_lsn         | 9/E8361C18  
write_lsn        | 9/E8361C18  
flush_lsn        | 9/E8361C18  
replay_lsn       | 9/D235B520  
write_lag        | 00:00:00.000101  
flush_lag        | 00:00:00.000184  
replay_lag       | 00:00:03.028098  
sync_priority    | 0  
sync_state       | async

小结

1 适合场景

1、PG物理流复制的从库，当激活后，可以开启读写，使用pg_rewind可以将从库回退为只读从库的角色。而不需要重建整个从库。

2、当异步主从发生角色切换后，主库的wal目录中可能还有没完全同步到从库的内容，因此老的主库无法直接切换为新主库的从库。使用pg_rewind可以修复老的主库，使之成为新主库的只读从库。而不需要重建整个从库。

如果没有pg_rewind，遇到以上情况，需要完全重建从库，如果库占用空间很大，重建非常耗时，也非常耗费上游数据库的资源（读）。

2 前提

要使用rewind功能：

1、必须开启fullpagewrites

2、必须开启datachecksums或wallog_hints

initdb -k 开启data_checksums

3 原理与修复流程

1、使用pg_rewind功能的前提条件：必须开启full page write，必须开启wal hint或者data block checksum。

2、需要被修复的库：从激活点开始，所有的WAL必须存在pgwal目录中。如果WAL已经被覆盖，只要有归档，拷贝到pgwal目录即可。

3、新的主库，从激活点开始，产生的所有WAL必须存在pgwal目录中，或者已归档，并且被修复的库可以使用restorecommand访问到这部分WAL。

4、修改(source db)新主库或老主库配置，允许连接。

5、修复时，连接新主库，得到切换点。或连接老主库，同时比对当前要修复的新主库的TL与老主库进行比对，得到切换点。

6、解析需要被修复的库的从切换点到现在所有的WAL。同时连接source db(新主库（或老主库）)，进行回退操作（被修改或删除的BLOCK从source db获取并覆盖，新增的BLOCK，直接抹除。）回退到切换点的状态。

7、修改被修复库(target db)的recovery.conf, postgresql.conf配置。

8、启动target db，连接source db接收WAL，或restore_command配置接收WAL，从切换点开始所有WAL，进行apply。

9、target db现在是source db的从库。

PostgreSQL中文社区欢迎广大技术人员投稿

投稿邮箱：press@postgres.cn

### **背景**
#### 1、PG物理流复制的从库，当激活后，可以开启读写，使用pg_rewind可以将从库回退为只读从库的角色。而不需要重建整个从库。
#### 2、当异步主从发生角色切换后，主库的wal目录中可能还有没完全同步到从库的内容，因此老的主库无法直接切换为新主库的从库。使用pg_rewind可以修复老的主库，使之成为新主库的只读从库。而不需要重建整个从库。
#### 3、如果没有pg_rewind，遇到以上情况，需要完全重建从库。或者你可以使用存储层快照，回退回脑裂以前的状态。又或者可以使用文件系统快照，回退回脑裂以前的状态。

### **原理与修复步骤**
#### 1、使用pg_rewind功能的前提条件：必须开启full page write，必须开启wal hint或者data block checksum。
#### 2、需要被修复的库：从激活点开始，所有的WAL必须存在pg_wal目录中。如果WAL已经被覆盖，只要有归档，拷贝到pg_wal目录即可。
#### 3、新的主库，从激活点开始，产生的所有WAL必须存在pg_wal目录中，或者已归档，并且被修复的库可以使用restore_command访问到这部分WAL。
#### 4、修改(source db)新主库或老主库配置，允许连接。
#### 5、修复时，连接新主库，得到切换点。或连接老主库，同时比对当前要修复的新主库的TL与老主库进行比对，得到切换点。
#### 6、解析需要被修复的库的从切换点到现在所有的WAL。同时连接source db(新主库（或老主库）)，进行回退操作（被修改或删除的BLOCK从source db获取并覆盖，新增的BLOCK，直接抹除。）回退到切换点的状态。
#### 7、修改被修复库(target db)的recovery.conf, postgresql.conf配置。
#### 8、启动target db，连接source db接收WAL，或restore_command配置接收WAL，从切换点开始所有WAL，进行apply。
#### 9、target db现在是source db的从库。

### **以EDB PG 11为例讲解**
### **环境部署**
#### 《MTK使用 - PG,PPAS,oracle,mysql,ms sql,sybase 迁移到 PG, PPAS (支持跨版本升级)》

####（https://github.com/digoal/blog/blob/master/201812/20181226_01.md）
```
export PS1="$USER@`/bin/hostname -s`-> "      
export PGPORT=4000  
export PGDATA=/data04/ppas11/pg_root4000  
export LANG=en_US.utf8      
export PGHOME=/usr/edb/as11  
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH      
export DATE=`date +"%Y%m%d%H%M"`    
export PATH=$PGHOME/bin:$PATH:.      
export MANPATH=$PGHOME/share/man:$MANPATH      
export PGHOST=127.0.0.1  
export PGUSER=postgres   
export PGDATABASE=postgres      
alias rm='rm -i'      
alias ll='ls -lh'      
unalias vi
```
#### 1、初始化数据库集群
```
initdb -D /data04/ppas11/pg_root4000 -E UTF8 --lc-collate=C --lc-ctype=en_US.UTF8 -U postgres -k --redwood-like
```
#### 2、配置recovery.done
```
cd $PGDATA

cp $PGHOME/share/recovery.conf.sample ./

mv recovery.conf.sample recovery.done

vi recovery.done

restore_command = 'cp /data04/ppas11/wal/%f %p'  
recovery_target_timeline = 'latest'  
standby_mode = on  
primary_conninfo = 'host=localhost port=4000 user=postgres'
```
#### 3、配置postgresql.conf
#### 要使用rewind功能：
#### 必须开启full_page_writes
#### 必须开启data_checksums或wal_log_hints
```
postgresql.conf

listen_addresses = '0.0.0.0'  
port = 4000  
max_connections = 8000  
superuser_reserved_connections = 13  
unix_socket_directories = '.,/tmp'  
unix_socket_permissions = 0700  
tcp_keepalives_idle = 60  
tcp_keepalives_interval = 10  
tcp_keepalives_count = 10  
shared_buffers = 16GB  
max_prepared_transactions = 8000  
maintenance_work_mem = 1GB  
autovacuum_work_mem = 1GB  
dynamic_shared_memory_type = posix  
vacuum_cost_delay = 0  
bgwriter_delay = 10ms  
bgwriter_lru_maxpages = 1000  
bgwriter_lru_multiplier = 10.0  
effective_io_concurrency = 0  
max_worker_processes = 128  
max_parallel_maintenance_workers = 8  
max_parallel_workers_per_gather = 8  
max_parallel_workers = 24  
wal_level = replica  
synchronous_commit = off  
full_page_writes = on  
wal_compression = on  
wal_buffers = 32MB  
wal_writer_delay = 10ms  
checkpoint_timeout = 25min  
max_wal_size = 32GB  
min_wal_size = 8GB  
checkpoint_completion_target = 0.2  
archive_mode = on  
archive_command = 'cp -n %p /data04/ppas11/wal/%f'  
max_wal_senders = 16  
wal_keep_segments = 4096  
max_replication_slots = 16  
hot_standby = on  
max_standby_archive_delay = 300s  
max_standby_streaming_delay = 300s  
wal_receiver_status_interval = 1s  
wal_receiver_timeout = 10s  
random_page_cost = 1.1  
effective_cache_size = 400GB  
log_destination = 'csvlog'  
logging_collector = on  
log_directory = 'log'  
log_filename = 'edb-%a.log'  
log_truncate_on_rotation = on  
log_rotation_age = 1d  
log_rotation_size = 0  
log_min_duration_statement = 1s  
log_checkpoints = on  
log_error_verbosity = verbose  
log_line_prefix = '%t '  
log_lock_waits = on  
log_statement = 'ddl'  
log_timezone = 'PRC'  
autovacuum = on  
log_autovacuum_min_duration = 0  
autovacuum_max_workers = 6  
autovacuum_freeze_max_age = 1200000000  
autovacuum_multixact_freeze_max_age = 1400000000  
autovacuum_vacuum_cost_delay = 0  
statement_timeout = 0  
lock_timeout = 0  
idle_in_transaction_session_timeout = 0  
vacuum_freeze_table_age = 1150000000  
vacuum_multixact_freeze_table_age = 1150000000  
datestyle = 'redwood,show_time'  
timezone = 'PRC'  
lc_messages = 'en_US.utf8'  
lc_monetary = 'en_US.utf8'  
lc_numeric = 'en_US.utf8'  
lc_time = 'en_US.utf8'  
default_text_search_config = 'pg_catalog.english'  
shared_preload_libraries = 'auto_explain,pg_stat_statements,$libdir/dbms_pipe,$libdir/edb_gen,$libdir/dbms_aq'  
edb_redwood_date = on  
edb_redwood_greatest_least = on  
edb_redwood_strings = on  
db_dialect = 'redwood'                
edb_dynatune = 66  
edb_dynatune_profile = oltp  
timed_statistics = off
```
#### 4、配置pg_hba.conf，允许流复制
```
local   all             all                                     trust  
host    all             all             127.0.0.1/32            trust  
host    all             all             ::1/128                 trust  
local   replication     all                                     trust  
host    replication     all             127.0.0.1/32            trust  
host    replication     all             ::1/128                 trust  
host all all 0.0.0.0/0 md5
```
#### 5、配置归档目录
```
mkdir /data04/ppas11/wal    
chown enterprisedb:enterprisedb /data04/ppas11/wal
```
#### 6、创建从库
```
pg_basebackup -h 127.0.0.1 -p 4000 -D /data04/ppas11/pg_root4001 -F p -c fast
```
#### 7、配置从库
```
cd /data04/ppas11/pg_root4001

mv recovery.done recovery.conf  
vi postgresql.conf

port = 4001
```
#### 8、启动从库
```
pg_ctl start -D /data04/ppas11/pg_root4001
```
#### 9、压测主库
```
pgbench -i -s 1000

pgbench -M prepared -v -r -P 1 -c 24 -j 24 -T 300
```
#### 10、检查归档
```
postgres=# select * from pg_stat_archiver ;  
 archived_count |    last_archived_wal     |        last_archived_time        | failed_count | last_failed_wal | last_failed_time |           stats_reset              
----------------+--------------------------+----------------------------------+--------------+-----------------+------------------+----------------------------------  
            240 | 0000000100000000000000F0 | 28-JAN-19 15:08:43.276965 +08:00 |            0 |                 |                  | 28-JAN-19 15:01:17.883338 +08:00  
(1 row)

postgres=# select * from pg_stat_archiver ;  
 archived_count |    last_archived_wal     |        last_archived_time        | failed_count | last_failed_wal | last_failed_time |           stats_reset              
----------------+--------------------------+----------------------------------+--------------+-----------------+------------------+----------------------------------  
            248 | 0000000100000000000000F8 | 28-JAN-19 15:08:45.120134 +08:00 |            0 |                 |                  | 28-JAN-19 15:01:17.883338 +08:00  
(1 row)
```
#### 11、检查从库延迟
```
postgres=# select * from pg_stat_replication ;  
-[ RECORD 1 ]----+---------------------------------  
pid              | 8124  
usesysid         | 10  
usename          | postgres  
application_name | walreceiver  
client_addr      | 127.0.0.1  
client_hostname  |   
client_port      | 62988  
backend_start    | 28-JAN-19 15:07:34.084542 +08:00  
backend_xmin     |   
state            | streaming  
sent_lsn         | 1/88BC2000  
write_lsn        | 1/88BC2000  
flush_lsn        | 1/88BC2000  
replay_lsn       | 1/88077D48  
write_lag        | 00:00:00.001417  
flush_lag        | 00:00:00.002221  
replay_lag       | 00:00:00.097657  
sync_priority    | 0  
sync_state       | async
```
### **例子1，从库激活后产生读写，使用pg_rewind修复从库，回退到只读从库**
#### 1、激活从库
```
pg_ctl promote -D /data04/ppas11/pg_root4001
```
#### 2、写从库
```
pgbench -M prepared -v -r -P 1 -c 4 -j 4 -T 120 -p 4001
```
#### 此时从库已经和主库不在一个时间线，无法直接变成当前主库的从库
```
enterprisedb@pg11-test-> pg_controldata -D /data04/ppas11/pg_root4001|grep -i time  
Latest checkpoint's TimeLineID:       1  
Latest checkpoint's PrevTimeLineID:   1  
Time of latest checkpoint:            Mon 28 Jan 2019 03:56:38 PM CST  
Min recovery ending loc's timeline:   2  
track_commit_timestamp setting:       off  
Date/time type storage:               64-bit integers

enterprisedb@pg11-test-> pg_controldata -D /data04/ppas11/pg_root4000|grep -i time  
Latest checkpoint's TimeLineID:       1  
Latest checkpoint's PrevTimeLineID:   1  
Time of latest checkpoint:            Mon 28 Jan 2019 05:11:38 PM CST  
Min recovery ending loc's timeline:   0  
track_commit_timestamp setting:       off  
Date/time type storage:               64-bit integers
```
#### 3、修复从库，使之继续成为当前主库的从库
#### 4、查看切换点
```
cd /data04/ppas11/pg_root4001

ll pg_wal/*.history  
-rw------- 1 enterprisedb enterprisedb 42 Jan 28 17:15 pg_wal/00000002.history

cat pg_wal/00000002.history  
1       6/48C62000      no recovery target specified
```
#### 5、从库激活时间开始产生的WAL必须全部在pg_wal目录中。
#### -rw------- 1 enterprisedb enterprisedb   42 Jan 28 17:15 00000002.history  
#### -rw------- 1 enterprisedb enterprisedb  16M Jan 28 17:16 000000020000000600000048  
#### ............
#### 000000020000000600000048开始，所有的wal必须存在从库pg_wal目录中。如果已经覆盖了，必须从归档目录拷贝到从库pg_wal目录中。
#### 6、从库激活时，主库从这个时间点开始所有的WAL还在pg_wal目录，或者从库可以使用restore_command获得(recovery.conf)。
```
recovery.conf

restore_command = 'cp /data04/ppas11/wal/%f %p'
```
#### 7、pg_rewind命令帮助
```
https://www.postgresql.org/docs/11/app-pgrewind.html
pg_rewind --help  
pg_rewind resynchronizes a PostgreSQL cluster with another copy of the cluster.

Usage:  
  pg_rewind [OPTION]...

Options:  
  -D, --target-pgdata=DIRECTORY  existing data directory to modify  
      --source-pgdata=DIRECTORY  source data directory to synchronize with  
      --source-server=CONNSTR    source server to synchronize with  
  -n, --dry-run                  stop before modifying anything  
  -P, --progress                 write progress messages  
      --debug                    write a lot of debug messages  
  -V, --version                  output version information, then exit  
  -?, --help                     show this help, then exit

Report bugs to <support@enterprisedb.com>.
```
#### 8、停库（被修复的库，停库）
```
pg_ctl stop -m fast -D /data04/ppas11/pg_root4001
```
#### 9、尝试修复
```
pg_rewind -n -D /data04/ppas11/pg_root4001 --source-server="hostaddr=127.0.0.1 user=postgres port=4000"

servers diverged at WAL location 6/48C62000 on timeline 1  
rewinding from last common checkpoint at 5/5A8CD30 on timeline 1  
Done!
```
#### 10、尝试正常，说明可以修复，实施修复
```
pg_rewind -D /data04/ppas11/pg_root4001 --source-server="hostaddr=127.0.0.1 user=postgres port=4000"

servers diverged at WAL location 6/48C62000 on timeline 1  
rewinding from last common checkpoint at 5/5A8CD30 on timeline 1  
Done!
```
#### 11、已修复，改配置
```
cd /data04/ppas11/pg_root4001

vi postgresql.conf  
port = 4001  
mv recovery.done recovery.conf

vi recovery.conf

restore_command = 'cp /data04/ppas11/wal/%f %p'   
recovery_target_timeline = 'latest'   
standby_mode = on   
primary_conninfo = 'host=localhost port=4000 user=postgres'
```
#### 12、删除归档中错误时间线上产生的文件否则会在启动修复后的从库后，走到00000002时间线上，这是不想看到的。
```
mkdir /data04/ppas11/wal/error_tl_2

mv /data04/ppas11/wal/00000002* /data04/ppas11/wal/error_tl_2
```
#### 13、启动从库
```
pg_ctl start -D /data04/ppas11/pg_root4001
```
#### 14、建议对主库做一个检查点，从库收到检查点后，重启后不需要应用太多WAL，而是从新检查点开始恢复
```
psql  
checkpoint;
```
#### 15、压测主库
```
pgbench -M prepared -v -r -P 1 -c 16 -j 16 -T 200 -p 4000
```
#### 16、查看归档状态
```
postgres=# select * from pg_stat_archiver ;  
 archived_count |    last_archived_wal     |        last_archived_time        | failed_count | last_failed_wal | last_failed_time |           stats_reset              
----------------+--------------------------+----------------------------------+--------------+-----------------+------------------+----------------------------------  
           1756 | 0000000100000006000000DC | 28-JAN-19 17:41:57.562425 +08:00 |            0 |                 |                  | 28-JAN-19 15:01:17.883338 +08:00  
(1 row)
```
#### 17、查看从库健康、延迟，观察修复后的情况
```
postgres=# select * from pg_stat_replication ;  
-[ RECORD 1 ]----+--------------------------------  
pid              | 13179  
usesysid         | 10  
usename          | postgres  
application_name | walreceiver  
client_addr      | 127.0.0.1  
client_hostname  |   
client_port      | 63198  
backend_start    | 28-JAN-19 17:47:29.85308 +08:00  
backend_xmin     |   
state            | catchup  
sent_lsn         | 7/DDE80000  
write_lsn        | 7/DC000000  
flush_lsn        | 7/DC000000  
replay_lsn       | 7/26A8DCB0  
write_lag        | 00:00:18.373263  
flush_lag        | 00:00:18.373263  
replay_lag       | 00:00:18.373263  
sync_priority    | 0  
sync_state       | async
```
### **例子2，从库激活成为新主库后，老主库依旧有读写，使用pg_rewind修复老主库，将老主库降级为新主库的从库**
#### 1、激活从库
```
pg_ctl promote -D /data04/ppas11/pg_root4001
```
#### 2、写从库
```
pgbench -M prepared -v -r -P 1 -c 16 -j 16 -T 200 -p 4001
```
#### 3、写主库
#### pgbench -M prepared -v -r -P 1 -c 16 -j 16 -T 200 -p 4000
#### 此时老主库已经和新的主库不在一个时间线
```
enterprisedb@pg11-test-> pg_controldata -D /data04/ppas11/pg_root4000|grep -i timeline  
Latest checkpoint's TimeLineID:       1  
Latest checkpoint's PrevTimeLineID:   1  
Min recovery ending loc's timeline:   0  
enterprisedb@pg11-test-> pg_controldata -D /data04/ppas11/pg_root4001|grep -i timeline  
Latest checkpoint's TimeLineID:       1  
Latest checkpoint's PrevTimeLineID:   1  
Min recovery ending loc's timeline:   2

enterprisedb@pg11-test-> cd /data04/ppas11/pg_root4001/pg_wal  
enterprisedb@pg11-test-> cat 00000002.history   
1       8/48DE2318      no recovery target specified

enterprisedb@pg11-test-> ll *.partial  
-rw------- 1 enterprisedb enterprisedb 16M Jan 28 17:48 000000010000000800000048.partial
```
#### 4、修复老主库，变成从库
#### 4.1、从库激活时，老主库从这个时间点开始所有的WAL，必须全部在pg_wal目录中。
#### 000000010000000800000048 开始的所有WAL必须存在pg_wal，如果已经覆盖了，必须从WAL归档拷贝到pg_wal目录
#### 4.2、从库激活时间开始产生的所有WAL，老主库必须可以使用restore_command获得(recovery.conf)。
```
recovery.conf

restore_command = 'cp /data04/ppas11/wal/%f %p'
```
#### 5、关闭老主库
```
pg_ctl stop -m fast -D /data04/ppas11/pg_root4000
```
#### 6、尝试修复老主库
```
pg_rewind -n -D /data04/ppas11/pg_root4000 --source-server="hostaddr=127.0.0.1 user=postgres port=4001"

servers diverged at WAL location 8/48DE2318 on timeline 1  
rewinding from last common checkpoint at 6/CCCEF770 on timeline 1  
Done!
```
#### 7、尝试成功，可以修复，实施修复
```
pg_rewind -D /data04/ppas11/pg_root4000 --source-server="hostaddr=127.0.0.1 user=postgres port=4001"
```
#### 8、修复完成后，改配置
```
cd /data04/ppas11/pg_root4000

vi postgresql.conf  
port = 4000  
mv recovery.done recovery.conf

vi recovery.conf

restore_command = 'cp /data04/ppas11/wal/%f %p'   
recovery_target_timeline = 'latest'   
standby_mode = on   
primary_conninfo = 'host=localhost port=4001 user=postgres'
```
#### 9、启动老主库
```
pg_ctl start -D /data04/ppas11/pg_root4000
```
#### 10、建议对新主库做一个检查点，从库收到检查点后，重启后不需要应用太多WAL，而是从新检查点开始恢复
```
checkpoint;
```
#### 11、压测新主库
```
pgbench -M prepared -v -r -P 1 -c 16 -j 16 -T 200 -p 4001
```
#### 12、查看归档状态
```
psql -p 4001

#### **2 前提**
#### 要使用rewind功能：
#### 1、必须开启full_page_writes
#### 2、必须开启data_checksums或wal_log_hints
#### initdb -k 开启data_checksums

#### **3 原理与修复流程**
#### 1、使用pg_rewind功能的前提条件：必须开启full page write，必须开启wal hint或者data block checksum。
#### 2、需要被修复的库：从激活点开始，所有的WAL必须存在pg_wal目录中。如果WAL已经被覆盖，只要有归档，拷贝到pg_wal目录即可。
#### 3、新的主库，从激活点开始，产生的所有WAL必须存在pg_wal目录中，或者已归档，并且被修复的库可以使用restore_command访问到这部分WAL。
#### 4、修改(source db)新主库或老主库配置，允许连接。
#### 5、修复时，连接新主库，得到切换点。或连接老主库，同时比对当前要修复的新主库的TL与老主库进行比对，得到切换点。
#### 6、解析需要被修复的库的从切换点到现在所有的WAL。同时连接source db(新主库（或老主库）)，进行回退操作（被修改或删除的BLOCK从source db获取并覆盖，新增的BLOCK，直接抹除。）回退到切换点的状态。
#### 7、修改被修复库(target db)的recovery.conf, postgresql.conf配置。
#### 8、启动target db，连接source db接收WAL，或restore_command配置接收WAL，从切换点开始所有WAL，进行apply。
#### 9、target db现在是source db的从库。

#### **PostgreSQL中文社区欢迎广大技术人员投稿**
#### **投稿邮箱：press@postgres.cn**

评论:1　　

顶: 598　

踩: 653　

请在登录后发表评论，否则无法保存。

1# __ xiaowu 回答于 2024-04-20 21:38:04+08

伤心的网名:https://www.nanss.com/mingcheng/455.html 儿童心理特点:https://www.nanss.com/yuer/115.html 霸气网名:https://www.nanss.com/mingcheng/252.html 巨蟹座日期:https://www.nanss.com/xingzuo/69.html 让人过目不忘的网名:https://www.nanss.com/mingcheng/243.html 自荐信:https://www.nanss.com/xuexi/214.html 我的愿望普通话三分钟:https://www.nanss.com/xuexi/31.html 简短的高级情话:https://www.nanss.com/shenghuo/160.html 既文明又骂人解气的话:https://www.nanss.com/yulu/329.html 三个字好听的网名:https://www.nanss.com/mingcheng/496.html 昵称大全:https://www.nanss.com/mingcheng/149.html 过年文案:https://www.nanss.com/wenan/20.html 茶庄名字:https://www.nanss.com/mingcheng/52.html 昵称简短好听:https://www.nanss.com/mingcheng/129.html 自我评价:https://www.nanss.com/gongzuo/402.html 女生好听罕见的英文名:https://www.nanss.com/mingcheng/429.html 工作室名字:https://www.nanss.com/mingcheng/53.html 玫瑰花多少朵的含义:https://www.nanss.com/wenti/125.html 微信名字男:https://www.nanss.com/mingcheng/210.html 房门朝向风水:https://www.nanss.com/jiaju/80.html 实习总结:https://www.nanss.com/gongzuo/163.html 王者荣耀昵称:https://www.nanss.com/mingcheng/358.html 有诗意的网名:https://www.nanss.com/mingcheng/452.html 高质量文案句子:https://www.nanss.com/wenan/192.html 女人最有福气的微信名:https://www.nanss.com/mingcheng/362.html 天蝎座日期:https://www.nanss.com/xingzuo/73.html 英文网名女:https://www.nanss.com/mingcheng/417.html 和平精英名字:https://www.nanss.com/mingcheng/366.html 实习周报:https://www.nanss.com/gongzuo/232.html 感谢老师对孩子的教育和辛苦的句子:https://www.nanss.com/yulu/26.html

发表评论：

您还没有登录,请您登录后再发表评论

加入我们

QQ群1：5276420

QQ群2：3336901

QQ群3：254622631

文档群：150657323

文档翻译平台：按此访问

社区邮件列表：按此订阅

商业支持

扫码关注

背景

​1、PG物理流复制的从库，当激活后，可以开启读写，使用pg_rewind可以将从库回退为只读从库的角色。而不需要重建整个从库。

2、当异步主从发生角色切换后，主库的wal目录中可能还有没完全同步到从库的内容，因此老的主库无法直接切换为新主库的从库。使用pg_rewind可以修复老的主库，使之成为新主库的只读从库。而不需要重建整个从库。

3、如果没有pg_rewind，遇到以上情况，需要完全重建从库。或者你可以使用存储层快照，回退回脑裂以前的状态。又或者可以使用文件系统快照，回退回脑裂以前的状态。

原理与修复步骤

1、使用pg_rewind功能的前提条件：必须开启full page write，必须开启wal hint或者data block checksum。

2、需要被修复的库：从激活点开始，所有的WAL必须存在pgwal目录中。如果WAL已经被覆盖，只要有归档，拷贝到pgwal目录即可。

3、新的主库，从激活点开始，产生的所有WAL必须存在pgwal目录中，或者已归档，并且被修复的库可以使用restorecommand访问到这部分WAL。

4、修改(source db)新主库或老主库配置，允许连接。

5、修复时，连接新主库，得到切换点。或连接老主库，同时比对当前要修复的新主库的TL与老主库进行比对，得到切换点。

6、解析需要被修复的库的从切换点到现在所有的WAL。同时连接source db(新主库（或老主库）)，进行回退操作（被修改或删除的BLOCK从source db获取并覆盖，新增的BLOCK，直接抹除。）回退到切换点的状态。

7、修改被修复库(target db)的recovery.conf, postgresql.conf配置。

8、启动target db，连接source db接收WAL，或restore_command配置接收WAL，从切换点开始所有WAL，进行apply。

9、target db现在是source db的从库。

以EDB PG 11为例讲解

环境部署

《MTK使用 - PG,PPAS,oracle,mysql,ms sql,sybase 迁移到 PG, PPAS (支持跨版本升级)》

（https://github.com/digoal/blog/blob/master/201812/20181226_01.md）

1、初始化数据库集群

2、配置recovery.done

3、配置postgresql.conf

要使用rewind功能：

必须开启fullpagewrites

必须开启datachecksums或wallog_hints

4、配置pg_hba.conf，允许流复制

5、配置归档目录

6、创建从库

7、配置从库

8、启动从库

9、压测主库

10、检查归档

11、检查从库延迟

例子1，从库激活后产生读写，使用pg_rewind修复从库，回退到只读从库

1、激活从库

2、写从库

此时从库已经和主库不在一个时间线，无法直接变成当前主库的从库

3、修复从库，使之继续成为当前主库的从库

4、查看切换点

5、从库激活时间开始产生的WAL必须全部在pg_wal目录中。

-rw------- 1 enterprisedb enterprisedb 42 Jan 28 17:15 00000002.history

-rw------- 1 enterprisedb enterprisedb 16M Jan 28 17:16 000000020000000600000048

…………

000000020000000600000048开始，所有的wal必须存在从库pgwal目录中。如果已经覆盖了，必须从归档目录拷贝到从库pgwal目录中。

6、从库激活时，主库从这个时间点开始所有的WAL还在pgwal目录，或者从库可以使用restorecommand获得(recovery.conf)。

7、pg_rewind命令帮助

8、停库（被修复的库，停库）

9、尝试修复

10、尝试正常，说明可以修复，实施修复

11、已修复，改配置

12、删除归档中错误时间线上产生的文件否则会在启动修复后的从库后，走到00000002时间线上，这是不想看到的。

13、启动从库

14、建议对主库做一个检查点，从库收到检查点后，重启后不需要应用太多WAL，而是从新检查点开始恢复

15、压测主库

16、查看归档状态

17、查看从库健康、延迟，观察修复后的情况

例子2，从库激活成为新主库后，老主库依旧有读写，使用pg_rewind修复老主库，将老主库降级为新主库的从库

1、激活从库

2、写从库

3、写主库

pgbench -M prepared -v -r -P 1 -c 16 -j 16 -T 200 -p 4000

此时老主库已经和新的主库不在一个时间线

4、修复老主库，变成从库

4.1、从库激活时，老主库从这个时间点开始所有的WAL，必须全部在pg_wal目录中。

000000010000000800000048 开始的所有WAL必须存在pgwal，如果已经覆盖了，必须从WAL归档拷贝到pgwal目录

4.2、从库激活时间开始产生的所有WAL，老主库必须可以使用restore_command获得(recovery.conf)。

5、关闭老主库

6、尝试修复老主库

7、尝试成功，可以修复，实施修复

8、修复完成后，改配置

9、启动老主库

10、建议对新主库做一个检查点，从库收到检查点后，重启后不需要应用太多WAL，而是从新检查点开始恢复

11、压测新主库

12、查看归档状态

13、查看从库健康、延迟

小结

1 适合场景

1、PG物理流复制的从库，当激活后，可以开启读写，使用pg_rewind可以将从库回退为只读从库的角色。而不需要重建整个从库。

2、当异步主从发生角色切换后，主库的wal目录中可能还有没完全同步到从库的内容，因此老的主库无法直接切换为新主库的从库。使用pg_rewind可以修复老的主库，使之成为新主库的只读从库。而不需要重建整个从库。

如果没有pg_rewind，遇到以上情况，需要完全重建从库，如果库占用空间很大，重建非常耗时，也非常耗费上游数据库的资源（读）。

2 前提

1、PG物理流复制的从库，当激活后，可以开启读写，使用pg_rewind可以将从库回退为只读从库的角色。而不需要重建整个从库。