背景
PostgreSQL heap TABLE AM引擎,使用多版本来解决快照问题,版本处于当前数据文件中,有垃圾回收进程进行回收,那么哪些垃圾不能被回收呢?
WAL是PG的REDO文件,哪些WAL不能被回收重复利用?什么情况下可能会一直增长不清理呢?
heap或INDEX的膨胀有些时候并不是因为回收慢,而是有些是无法被回收的垃圾,通常被称为膨胀点。本文对膨胀点进行逐一解释(回收慢不解释,可能: worker太少,io太差,worker睡眠太长或频繁,vacuum mem太少放不下所有垃圾行CTID导致多次扫描索引,launcher唤醒周期太长,表太大未支持并行垃圾回收, …)。
除了snapshot too old以外,12新增AM例如zedstore, zheap将彻底解决heap的垃圾版本带来的膨胀问题。
全局catalog 膨胀点
全局catalog包括tbs,db,role等,如下:
postgres=# select relname from pg_class
where reltablespace in
(select oid from pg_tablespace where spcname='pg_global')
and relkind='r';
relname
-----------------------
pg_authid
pg_subscription
pg_database
pg_db_role_setting
pg_tablespace
pg_pltemplate
pg_auth_members
pg_shdepend
pg_shdescription
pg_replication_origin
pg_shseclabel
(11 rows)
哪些垃圾不能被回收?
1、当前实例中最老事务快照之后产生的垃圾记录
2、SLOT catalog_xmin后产生的垃圾记录
3、年龄小于vacuum_defer_cleanup_age设置的垃圾记录
4、备库开启了feedback后,备库中最老事务快照(包括catalog_xmin, global xmin)之后产生的垃圾记录
什么时候可能膨胀?
1、standby 开启了 feedback (且standby有慢事务, LONG SQL, 慢/dead slot),
2、慢/dead slot(catalog_xmin, 影响catalog垃圾回收),
3、vacuum_defer_cleanup_age 设置太大
4、整个实例中的 : 长事务, 慢SQL, 慢2pc,
库级catalog 膨胀点
库级catalog包括如下:
postgres=#
select relname from pg_class where relkind='r'
and relnamespace ='pg_catalog'::regnamespace
except
select relname from pg_class where reltablespace in
(select oid from pg_tablespace where spcname = 'pg_global')
and relkind='r';
relname
-------------------------
pg_language
pg_sequence
pg_largeobject
pg_policy
pg_ts_template
pg_attrdef
pg_operator
pg_ts_parser
pg_depend
pg_attribute
pg_ts_config
pg_conversion
pg_inherits
pg_subscription_rel
pg_publication
pg_foreign_table
pg_largeobject_metadata
pg_ts_dict
pg_statistic
pg_init_privs
pg_opfamily
pg_type
pg_am
pg_default_acl
pg_proc
pg_index
pg_rewrite
pg_statistic_ext
pg_constraint
pg_opclass
pg_partitioned_table
pg_namespace
pg_trigger
pg_enum
pg_amop
pg_event_trigger
pg_collation
pg_foreign_server
pg_foreign_data_wrapper
pg_user_mapping
pg_description
pg_cast
pg_publication_rel
pg_aggregate
pg_transform
pg_extension
pg_class
pg_seclabel
pg_amproc
pg_range
pg_ts_config_map
(51 rows)
哪些垃圾不能被回收?
1、当前数据库中最老事务快照之后产生的垃圾记录
2、年龄小于vacuum_defer_cleanup_age设置的垃圾记录
3、备库开启了feedback后,备库返回的最老事务快照(包括catalog_xmin, global xmin)之后产生的垃圾记录
4、SLOT catalog_xmin后产生的垃圾记录(create table, drop table, pg_class, pg_att等)。影响全局(所有DB)
什么时候可能膨胀?
1、standby 开启了 feedback (且standby有慢事务, LONG SQL, 慢/dead slot),
2、慢/dead slot(catalog_xmin, 影响catalog垃圾回收),
3、vacuum_defer_cleanup_age 设置太大
4、当前数据库中的 : 长事务, 慢SQL, 慢2pc,
普通对象 膨胀点
用户创建的表、物化视图、索引等。
哪些垃圾不能被回收?
1、当前数据库中最老事务快照之后产生的垃圾记录
2、年龄小于vacuum_defer_cleanup_age设置的垃圾记录
3、备库开启了feedback后,备库返回的最老事务快照(仅指 global xmin)之后产生的垃圾记录。(catalog xmin无影响)
什么时候可能膨胀?
1、standby 开启了 feedback (且standby有慢事务, LONG SQL),
2、vacuum_defer_cleanup_age 设置太大
3、当前数据库中的 : 长事务, 慢SQL, 慢2pc,
WAL文件 膨胀点
wal是指PG的REDO文件。
哪些WAL不能被回收 或 不能被重复利用?
1、从最后一次已正常结束的检查点(检查点开始时刻, 不是结束时刻)开始,所有的REDO文件都不能被回收
2、归档开启后,所有未归档的REDO。(.ready对应的redo文件)
3、启用SLOT后,还没有被SLOT消费的REDO文件
4、设置wal_keep_segments时,当REDO文件数还没有达到wal_keep_segments个时。
什么时候可能膨胀?
1、archive failed ,归档失败
2、user defined archive BUG,用户开启了归档,但是没有正常的将.ready改成.done,使得WAL堆积
3、wal_keep_segments 设置太大,WAL保留过多
4、max_wal_size设置太大,并且checkpoint_completion_target设置太大,导致检查点跨度很大,保留WAL文件很多
5、slot slow(dead) ,包括(physical | logical replication) , restart_lsn 开始的所有WAL文件都要被保留
一些例子
1、创建slot
postgres=# select pg_create_logical_replication_slot('a','test_decoding');
pg_create_logical_replication_slot
------------------------------------
(a,0/92C9C038)
(1 row)
2、查看slot的位点信息
postgres=# select * from pg_get_replication_slots();
slot_name | plugin | slot_type | datoid | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
-----------+---------------+-----------+--------+-----------+--------+------------+------+--------------+-------------+---------------------
a | test_decoding | logical | 13585 | f | f | | | 1982645 | 0/92C9BFE8 | 0/92C9C038
(1 row)
3、查看catalog_xmin对应XID的事务提交时间,需要开启事务时间跟踪track_commit_timestamp
postgres=# select pg_xact_commit_timestamp(xmin),pg_xact_commit_timestamp(catalog_xmin) from pg_get_replication_slots();
psql: ERROR: could not get commit timestamp data
HINT: Make sure the configuration parameter "track_commit_timestamp" is set.
4、从RESTART_LSN找到对应WAL文件,从文件中也可以查到大概的时间。
postgres=# select pg_walfile_name(restart_lsn) from pg_get_replication_slots();
pg_walfile_name
--------------------------
000000010000000000000092
(1 row)
postgres=# select * from pg_stat_file('pg_wal/000000010000000000000092');
size | access | modification | change | creation | isdir
----------+------------------------+------------------------+------------------------+----------+-------
16777216 | 2019-06-29 22:56:16+08 | 2019-07-01 09:50:16+08 | 2019-07-01 09:50:16+08 | | f
(1 row)
postgres=# select * from pg_ls_waldir() where name='000000010000000000000092';
name | size | modification
--------------------------+----------+------------------------
000000010000000000000092 | 16777216 | 2019-07-01 09:50:16+08
(1 row)
5、建表
postgres=# create table b(id int);
CREATE TABLE
postgres=# insert into b values (1);
INSERT 0 1
6、消费SLOT WAL
postgres=# select * from pg_logical_slot_get_changes('a',pg_current_wal_lsn(),1);
lsn | xid | data
------------+---------+----------------
0/92C9C0C0 | 1982645 | BEGIN 1982645
0/92CA4A40 | 1982645 | COMMIT 1982645
(2 rows)
postgres=# select * from pg_logical_slot_get_changes('a',pg_current_wal_lsn(),1);
lsn | xid | data
------------+---------+---------------------------------------
0/92CA4A78 | 1982646 | BEGIN 1982646
0/92CA4A78 | 1982646 | table public.b: INSERT: id[integer]:1
0/92CA4AE8 | 1982646 | COMMIT 1982646
(3 rows)
7、删除记录
postgres=# delete from b;
DELETE 1
8、垃圾回收,正常。本地表垃圾不受slot catalog_xmin影响
postgres=# vacuum verbose b;
psql: INFO: vacuuming "public.b"
psql: INFO: "b": removed 1 row versions in 1 pages
psql: INFO: "b": found 1 removable, 0 nonremovable row versions in 1 out of 1 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 1982648
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
psql: INFO: "b": truncated 1 to 0 pages
DETAIL: CPU: user: 0.09 s, system: 0.00 s, elapsed: 0.09 s
VACUUM
9、建表,删表,使得CATALOG发生变化,产生CATALOG垃圾
postgres=# create table c (id int);
CREATE TABLE
postgres=# drop table c;
DROP TABLE
postgres=# create table c (id int);
CREATE TABLE
postgres=# drop table c;
DROP TABLE
10、垃圾回收catalog,无法回收SLOT后产生的CATALOG垃圾,因为还需要这个CATALOG版本去解析对应WAL的LOGICAL 日志
postgres=# vacuum verbose pg_class;
psql: INFO: vacuuming "pg_catalog.pg_class"
psql: INFO: "pg_class": found 0 removable, 465 nonremovable row versions in 13 out of 13 pages
DETAIL: 2 dead row versions cannot be removed yet, oldest xmin: 1982646
There were 111 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM
catalog 受影响
postgres=# vacuum verbose pg_attribute ;
psql: INFO: vacuuming "pg_catalog.pg_attribute"
psql: INFO: "pg_attribute": found 0 removable, 293 nonremovable row versions in 6 out of 62 pages
DETAIL: 14 dead row versions cannot be removed yet, oldest xmin: 1982646
There were 55 unused item identifiers.
Skipped 0 pages due to buffer pins, 55 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM
11、长事务不影响其他库的垃圾回收
postgres
postgres=# begin;
BEGIN
postgres=# delete from a;
DELETE 1
db1
db1=# create table b(id int);
CREATE TABLE
db1=# insert into b values (1);
INSERT 0 1
db1=# delete from b;
DELETE 1
db1=# vacuum verbose b;
psql: INFO: vacuuming "public.b"
psql: INFO: "b": removed 1 row versions in 1 pages
psql: INFO: "b": found 1 removable, 0 nonremovable row versions in 1 out of 1 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 1982671
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
psql: INFO: "b": truncated 1 to 0 pages
DETAIL: CPU: user: 0.09 s, system: 0.00 s, elapsed: 0.09 s
VACUUM
小结
1 全局catalog 膨胀点
哪些垃圾不能被回收?
1、年龄小于vacuum_defer_cleanup_age设置的垃圾记录
2、当前实例中最老事务快照之后产生的垃圾记录
3、SLOT catalog_xmin后产生的垃圾记录
4、备库开启了feedback后,备库中最老事务快照(包括catalog_xmin, global xmin)之后产生的垃圾记录
什么时候可能膨胀?
1、vacuum_defer_cleanup_age 设置太大
2、整个实例中的 : 长事务, 慢SQL, 慢2pc,
3、慢/dead slot(catalog_xmin, 影响catalog垃圾回收),
4、standby 开启了 feedback (且standby有慢事务, LONG SQL, 慢/dead slot),
2 库级catalog 膨胀点
哪些垃圾不能被回收?
1、年龄小于vacuum_defer_cleanup_age设置的垃圾记录
2、当前数据库中最老事务快照之后产生的垃圾记录
3、备库开启了feedback后,备库返回的最老事务快照(包括catalog_xmin, global xmin)之后产生的垃圾记录
4、SLOT catalog_xmin后产生的垃圾记录(create table, drop table, pg_class, pg_att等)。影响全局(所有DB)
什么时候可能膨胀?
1、vacuum_defer_cleanup_age 设置太大
2、当前数据库中的 : 长事务, 慢SQL, 慢2pc,
3、standby 开启了 feedback (且standby有慢事务, LONG SQL, 慢/dead slot),
4、慢/dead slot(catalog_xmin, 影响catalog垃圾回收),
普通对象 膨胀点
用户创建的表、物化视图、索引等。
哪些垃圾不能被回收?
1、年龄小于vacuum_defer_cleanup_age设置的垃圾记录
2、当前数据库中最老事务快照之后产生的垃圾记录
3、备库开启了feedback后,备库返回的最老事务快照(仅指 global xmin)之后产生的垃圾记录。(catalog xmin无影响)
什么时候可能膨胀?
1、vacuum_defer_cleanup_age 设置太大
2、当前数据库中的 : 长事务, 慢SQL, 慢2pc,
3、standby 开启了 feedback (且standby有慢事务, LONG SQL),
WAL文件 膨胀点
wal是指PG的REDO文件。
哪些WAL不能被回收 或 不能被重复利用?
1、从最后一次已正常结束的检查点(检查点开始时刻, 不是结束时刻)开始,所有的REDO文件都不能被回收
2、归档开启后,所有未归档的REDO。(.ready对应的redo文件)
3、启用SLOT后,还没有被SLOT消费的REDO文件
4、设置wal_keep_segments时,当REDO文件数还没有达到wal_keep_segments个时。
什么时候可能膨胀?
1、archive failed ,归档失败
2、user defined archive BUG,用户开启了归档,但是没有正常的将.ready改成.done,使得WAL堆积
3、wal_keep_segments 设置太大,WAL保留过多
4、max_wal_size设置太大,并且checkpoint_completion_target设置太大,导致检查点跨度很大,保留WAL文件很多
5、slot slow(dead) ,包括(physical | logical replication) , restart_lsn 开始的所有WAL文件都要被保留
参考
switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
{
case HEAPTUPLE_DEAD:
/*
* Ordinarily, DEAD tuples would have been removed by
* heap_page_prune(), but it's possible that the tuple
* state changed since heap_page_prune() looked. In
* particular an INSERT_IN_PROGRESS tuple could have
* changed to DEAD if the inserter aborted. So this
* cannot be considered an error condition.
*
* If the tuple is HOT-updated then it must only be
* removed by a prune operation; so we keep it just as if
* it were RECENTLY_DEAD. Also, if it's a heap-only
* tuple, we choose to keep it, because it'll be a lot
* cheaper to get rid of it in the next pruning pass than
* to treat it like an indexed tuple. Finally, if index
* cleanup is disabled, the second heap pass will not
* execute, and the tuple will not get removed, so we must
* treat it like any other dead tuple that we choose to
* keep.
*
* If this were to happen for a tuple that actually needed
* to be deleted, we'd be in trouble, because it'd
* possibly leave a tuple below the relation's xmin
* horizon alive. heap_prepare_freeze_tuple() is prepared
* to detect that case and abort the transaction,
* preventing corruption.
*/
if (HeapTupleIsHotUpdated(&tuple) ||
HeapTupleIsHeapOnly(&tuple) ||
params->index_cleanup == VACOPT_TERNARY_DISABLED)
nkeep += 1;
else
tupgone = true; /* we can delete the tuple */
all_visible = false;
break;
case HEAPTUPLE_RECENTLY_DEAD:
/*
* If tuple is recently deleted then we must not remove it
* from relation.
*/
nkeep += 1;
all_visible = false;
break;
src/backend/access/heap/heapam_visibility.c
* HeapTupleSatisfiesVacuum()
* visible to any running transaction, used by VACUUM
/*
* HeapTupleSatisfiesVacuum
*
* Determine the status of tuples for VACUUM purposes. Here, what
* we mainly want to know is if a tuple is potentially visible to *any*
* running transaction. If so, it can't be removed yet by VACUUM.
*
* OldestXmin is a cutoff XID (obtained from GetOldestXmin()). Tuples
* deleted by XIDs >= OldestXmin are deemed "recently dead"; they might
* still be visible to some open transaction, so we can't remove them,
* even if we see that the deleting transaction has committed.
*/
HTSV_Result
HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer)
/*
* Deleter committed, but perhaps it was recent enough that some open
* transactions could still see the tuple.
*/
if (!TransactionIdPrecedes(HeapTupleHeaderGetRawXmax(tuple), OldestXmin))
return HEAPTUPLE_RECENTLY_DEAD;
else if (TransactionIdDidCommit(xmax))
{
/*
* The multixact might still be running due to lockers. If the
* updater is below the xid horizon, we have to return DEAD
* regardless -- otherwise we could end up with a tuple where the
* updater has to be removed due to the horizon, but is not pruned
* away. It's not a problem to prune that tuple, because any
* remaining lockers will also be present in newer tuple versions.
*/
if (!TransactionIdPrecedes(xmax, OldestXmin))
return HEAPTUPLE_RECENTLY_DEAD;
return HEAPTUPLE_DEAD;
}
Hot Standby feedback message (F)
Byte1('h')
Identifies the message as a Hot Standby feedback message.
Int64
The client's system clock at the time of transmission, as microseconds since midnight on 2000-01-01.
Int32
The standby's current global xmin, excluding the catalog_xmin from any replication slots. If both this value and the following catalog_xmin are 0 this is treated as a notification that Hot Standby feedback will no longer be sent on this connection. Later non-zero messages may reinitiate the feedback mechanism.
Int32
The epoch of the global xmin xid on the standby.
Int32
The lowest catalog_xmin of any replication slots on the standby. Set to 0 if no catalog_xmin exists on the standby or if hot standby feedback is being disabled.
Int32
The epoch of the catalog_xmin xid on the standby.

http://nalei.zjtcbmw.cn/sjzr/ http://xinguang.sctcbmw.cn/zhangye/ https://toutunhe.tiancebbs.cn/ https://honglan.tiancebbs.cn/hanzhong/ http://ruanwen.xztcxxw.cn/dalian/ http://nalei.zjtcbmw.cn/cqjb/ http://ty.cqtcxxw.cn/fuzhou/ http://cf.lstcxxw.cn/wuzhou/ http://huaguang.jxtcbmw.cn/shenzhen/ http://huaguang.jxtcbmw.cn/zjnb/ http://nalei.zjtcbmw.cn/anqing/ https://bsqgaojing.tiancebbs.cn/ http://ly.shtcxxw.cn/putuoqu/ http://bjtcxxw.cn/yulin/ https://shfxzl.tiancebbs.cn/ http://js.sytcxxw.cn/bhxq/ http://nalei.zjtcbmw.cn/esnyjx/
http://taiying.njtcbmw.cn/sp/ http://taiying.njtcbmw.cn/ershou/ http://bjtcxxw.cn/chengmai/ https://longmenzhen.tiancebbs.cn/ http://huilong.sctcbmw.cn/twty/ https://luhu.tiancebbs.cn/ https://suihua.tiancebbs.cn/ http://js.sytcxxw.cn/chaoyang/ http://yz.cqtcxxw.cn/kashi/ http://huaguang.jxtcbmw.cn/zhengzhou/ http://jingren.hftcbmw.cn/huanggang/ http://shenghuo.china-bbs.com/changningqu/ http://nalei.zjtcbmw.cn/esf/ http://bjtcxxw.cn/xilinguolemeng/ https://jinzao.tiancebbs.cn/ http://gx.lztcxxw.cn/fengxiansh/ http://jinqiang.ahtcbmw.cn/xjkzsa/
床头柜高度:https://www.nanss.com/jiaju/20431.html 个人计算机属于:https://www.nanss.com/xuexi/19800.html 学校支部书记述职报告:https://www.nanss.com/gongzuo/20073.html 美国参议院和众议院的区别:https://www.nanss.com/shenghuo/20154.html 组织生活会对照检查材料2024年度:https://www.nanss.com/gongzuo/20582.html 日出即景:https://www.nanss.com/xuexi/xiezuo/20731.html 立春是什么意思:https://www.nanss.com/shenghuo/20352.html 农村土地承包合同:https://www.nanss.com/gongzuo/18772.html 关于亲情的作文600字:https://www.nanss.com/xuexi/20650.html 经典爱情文章:https://www.nanss.com/yuedu/18843.html 高考作文素材:https://www.nanss.com/xuexi/18690.html 战略性新兴产业:https://www.nanss.com/shenghuo/19573.html 漏电保护器的使用是防止:https://www.nanss.com/shenghuo/20415.html 四年级班主任总结:https://www.nanss.com/gongzuo/19291.html 跟单流程:https://www.nanss.com/gongzuo/20634.html 香山红叶几月份好看:https://www.nanss.com/wenti/18321.html 做党和人民满意的好老师:https://www.nanss.com/gongzuo/19628.html 雪里红的营养价值:https://www.nanss.com/yinshi/18476.html 端午节手抄报内容:https://www.nanss.com/xuexi/20240.html 如何培养团队精神:https://www.nanss.com/gongzuo/20655.html 小年吃什么:https://www.nanss.com/yinshi/20353.html 大熊猫的生活习性:https://www.nanss.com/shenghuo/20606.html 小学体育课教案:https://www.nanss.com/gongzuo/20436.html 莲花楼最大的反派是谁:https://www.nanss.com/wenti/19706.html 羽毛球比赛方案:https://www.nanss.com/xuexi/20519.html 领导班子述职报告:https://www.nanss.com/gongzuo/19817.html 植物生长观察日记:https://www.nanss.com/xuexi/19181.html 上火吃什么可以降火:https://www.nanss.com/wenti/19721.html 貔貅的作用:https://www.nanss.com/jiaju/20418.html 二年级日记大全:https://www.nanss.com/xuexi/xiezuo/20757.html