PostgreSQL 10.0 内置分区表 原作者:digoal/德哥 创作时间:2016-12-15 22:24:58+08 |
doudou586 发布于2016-12-19 22:24:58 评论: 1 浏览: 12662 顶: 985 踩: 1032 |
PostgreSQL 和它的LOGO大象一样,给人非常强大的安全感。
就拿它的Feature来说,一个大的feature要打磨很多年才能正式的合并到master分支。
比如并行计算的特性,从9.4就开始准备,加入了work process和dynamic shared memory的功能,奠定了多进程并行执行的基础。
一直到9.6,经历了3年的开发,大量的测试,终于RELEASE了。
今天要说一说它的分区表,经历了几年的酝酿,终于在10.0加入到master了。(PG的企业版本EDB很早就支持分区表了)
如果你现在需要这个功能,可以使用postgrespro的插件, pg_pathman,否则就等10.0吧,明年9月份左右release。
《PostgreSQL 9.5+ 高效分区表实现 - pg_pathman》
PostgreSQL的分区表依旧使用了继承的特性,好消息是, 终于不需要手工写规则了。
在使用方面,我们需要先创建主表,然后创建分区(即子表)。老顽固啊,就是要让你记住PG是有继承的概念的吗?
目前支持range、list分区(10.0 release时应该会支持更多的方法),分区列的类型必须支持btree索引接口(几乎涵盖所有类型, 后面会说到检查方法)。
1. 创建主表
[ PARTITION BY { RANGE | LIST } ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [, ... ] ) ]
2. 创建分区
PARTITION OF parent_table [ ( { column_name [ column_constraint [ ... ] ] | table_constraint } [, ... ] ) ] FOR VALUES partition_bound_spec and partition_bound_spec is: { IN ( expression [, ...] ) -- list分区 | FROM ( { expression | UNBOUNDED } [, ...] ) TO ( { expression | UNBOUNDED } [, ...] ) } -- range分区, unbounded表示无限小或无限大
1. 创建主表
partition by 指定分区表的类型range或list
指定分区列,或表达式作为分区键。
range分区表键:支持指定多列、或多表达式,支持混合(键,非表达式中的列,会自动添加not null的约束)
list分区表键:支持单个列、或单个表达式
分区键必须有对应的btree索引方法的ops(可以查看系统表得到)
select typname from pg_type where oid in (select opcintype from pg_opclass);
主表不会有任何数据,数据会根据分区规则进入对应的分区表
如果插入数据时,分区键的值没有匹配的分区,会报错
不支持全局的unique, primary key, exclude, foreign key约束,只能在对应的分区建立这些约束
PARTITION BY { RANGE | LIST } ( { column_name | ( expression ) } [ opclass ] [, ...] )
The optional PARTITION BY clause specifies a strategy of partitioning the table. The table thus created is called a partitioned table. The parenthesized list of columns or expressions forms the partition key for the table. When using range partitioning, the partition key can include multiple columns or expressions, but for list partitioning, the partition key must consist of a single column or expression. If no btree operator class is specified when creating a partitioned table, the default btree operator class for the datatype will be used. If there is none, an error will be reported. A partitioned table is divided into sub-tables (called partitions), which are created using separate CREATE TABLE commands. The partitioned table is itself empty. A data row inserted into the table is routed to a partition based on the value of columns or expressions in the partition key. If no existing partition matches the values in the new row, an error will be reported. Partitioned tables do not support UNIQUE, PRIMARY KEY, EXCLUDE, or FOREIGN KEY constraints; however, you can define these constraints on individual partitions. When using range partitioning, a NOT NULL constraint is added to each non-expression column in the partition key.
2. 创建分区
主表创建好之后,需要再创建分区(为了体现继承么?就不能一次干掉?)
创建分区表需要指定它的主表是哪个?
指定分区键的属性(范围,LIST值),范围,LIST不能有重叠(这个很好理解,否则重叠部分的数据到底插到哪个分区去呢?)
分区表和主表的 列数量,定义 必须完全一致,(包括OID也必须一致,要么都有,要么都没有)
可以为分区表的列单独增加Default值,或约束。
用户还可以对分区表增加表级约束
如果新增的分区表check约束,名字与主表的约束名一致,则约束内容必须与主表一致
当用户往主表插入数据库时,记录被自动路由到对应的分区,如果没有合适的分区,则报错
如果更新数据,并且更新后的KEY导致数据需要移动到另一分区,则会报错,(意思是分区键可以更新,但是不支持更新后的数据移出到别的分区表)
修改主表的字段名,字段类型时,会自动同时修改所有的分区
TRUNCATE 主表时,会清除所有继承表分区的记录(如果有多级分区,则会一直清除到所有的直接和间接继承的分区)
如果要清除单个分区,请对分区进行操作
如果要删除分区表,可以使用DROP TABLE的DDL语句,注意这个操作会对主表也加access exclusive lock。
PARTITION OF parent_table FOR VALUES partition_bound_spec partition_bound_spec is: { IN ( expression [, ...] ) | FROM ( { expression | UNBOUNDED } [, ...] ) TO ( { expression | UNBOUNDED } [, ...] ) }
Creates the table as partition of the specified parent table. The partition bound specification must correspond to the partitioning method and partition key of the parent table, and must not overlap with any existing partition of that parent. A partition cannot have columns other than those inherited from the parent. That includes the oid column, which can be specified using the WITH (OIDS) clause. Defaults and constraints can optionally be specified for each of the inherited columns. One can also specify table constraints in addition to those inherited from the parent. If a check constraint with the name matching one of the parent's constraint is specified, it is merged with the latter, provided the specified condition is same. Rows inserted into a partitioned table will be automatically routed to the correct partition. If no suitable partition exists, an error will occur. Also, if updating a row in a given partition causes it to move to another partition due to the new partition key, an error will occur. A partition must have the same column names and types as the table of which it is a partition. Therefore, modifications to the column names or types of the partitioned table will automatically propagate to all children, as will operations such as TRUNCATE which normally affect a table and all of its inheritance children. It is also possible to TRUNCATE a partition individually, just as for an inheritance child. Note that dropping a partition with DROP TABLE requires taking an ACCESS EXCLUSIVE lock on the parent table.
wget https://git.postgresql.org/gitweb/?p=postgresql.git;a=snapshot;h=55caaaeba877eac1feb6481fb413fa04ae9046ac;sf=tgz tar -zxvf postgresql-55caaae.tar.gz cd postgresql-55caaae ./configure --prefix=/home/digoal/pgsql10.0 make world -j 32 make install-world export PGPORT=5299 export PGDATA=/disk1/pgdata/pg_root10 export LANG=en_US.utf8 export PGHOME=/home/digoal/pgsql10.0 export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH export DATE=`date +"%Y%m%d%H%M"` export PATH=$PGHOME/bin:$PATH:. export MANPATH=$PGHOME/share/man:$MANPATH export PGHOST=$PGDATA export PGUSER=postgres export PGDATABASE=postgres initdb -D $PGDATA -U postgres -E SQL_ASCII --locale=C vi $PGDATA/postgresql.conf cat postgresql.conf|grep "^[a-z]"|awk -F "#" '{print $1}' listen_addresses = '0.0.0.0' port = 5299 max_connections = 500 superuser_reserved_connections = 13 unix_socket_directories = '.' shared_buffers = 16GB maintenance_work_mem = 1024MB dynamic_shared_memory_type = posix bgwriter_delay = 10ms bgwriter_lru_maxpages = 1000 bgwriter_lru_multiplier = 5.0 bgwriter_flush_after = 0 max_parallel_workers_per_gather = 0 max_parallel_workers = 0 old_snapshot_threshold = -1 backend_flush_after = 0 wal_buffers = 512MB wal_writer_delay = 10ms wal_writer_flush_after = 0 checkpoint_timeout = 55min max_wal_size = 128GB min_wal_size = 8GB checkpoint_completion_target = 0.5 checkpoint_flush_after = 0 random_page_cost = 1.0 effective_cache_size = 400GB log_destination = 'csvlog' logging_collector = on log_truncate_on_rotation = on log_checkpoints = on log_connections = on log_disconnections = on log_error_verbosity = verbose log_timezone = 'PRC' log_autovacuum_min_duration = 0 datestyle = 'iso, mdy' timezone = 'PRC' lc_messages = 'C' lc_monetary = 'C' lc_numeric = 'C' lc_time = 'C' default_text_search_config = 'pg_catalog.english' pg_ctl start
由于分区表的键值列必须支持btree索引,所以建议确认一下。
创建主表时,需要指定ops,如果没有指定,会自动选择类型默认的btree ops.
postgres=# select opcintype::regtype,opcname from pg_opclass ; opcintype | opcname -----------------------------+------------------------ abstime | abstime_ops anyarray | array_ops anyarray | array_ops bit | bit_ops boolean | bool_ops character | bpchar_ops character | bpchar_ops 。。。。。。
1. range 分区表
例子为按时间字段分区的分区表,用户在创建主表时,指定分区键,创建分区时,指定每个分区的范围。
1.1 主表
range分区表,支持混合式的分区键(多列,多表达式,或者混合式)
和前面说的一样,不能使用全局PK
postgres=# create table t_range(id int primary key, info text, crt_time timestamp) partition by range (crt_time, mod(hashtext(info), 4)); ERROR: 0A000: primary key constraints are not supported on partitioned tables LINE 1: create table t_range(id int primary key, info text, crt_time... ^ LOCATION: transformColumnDefinition, parse_utilcmd.c:620 postgres=# create table t_range(id int, info text, crt_time timestamp) partition by range (crt_time); CREATE TABLE
1.2 分区
postgres=# create table t_range_0_201610 partition of t_range (id primary key, info , crt_time ) for values from ('2016-10-01') to ('2016-11-01'); -- >= 20161001 and < 20161101 CREATE TABLE postgres=# create table t_range_0_201611 partition of t_range (id primary key, info , crt_time ) for values from ('2016-11-01') to ('2016-12-01'); -- >= 20161101 and < 20161201 CREATE TABLE
甚至你可以将分区表放到不同的schema下面
postgres=# create schema x; CREATE SCHEMA postgres=# create table x.t_range_0_201612 partition of t_range (id primary key, info , crt_time ) for values from ('2016-12-01') to ('2017-01-01'); CREATE TABLE postgres=# \d+ t_range Table "public.t_range" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ----------+-----------------------------+-----------+----------+---------+----------+--------------+------------- id | integer | | | | plain | | info | text | | | | extended | | crt_time | timestamp without time zone | | not null | | plain | | Partition key: RANGE (crt_time) Partitions: t_range_0_201610 FOR VALUES FROM ('2016-10-01 00:00:00') TO ('2016-11-01 00:00:00'), t_range_0_201611 FOR VALUES FROM ('2016-11-01 00:00:00') TO ('2016-12-01 00:00:00'), x.t_range_0_201612 FOR VALUES FROM ('2016-12-01 00:00:00') TO ('2017-01-01 00:00:00')
2. list 分区表
2.1 主表
postgres=# create table t_list(id int, info text, crt_time timestamp) partition by list ( mod(hashtext(info), 4) ); CREATE TABLE
2.2 分区
postgres=# create table t_list_0 partition of t_list (id primary key, info , crt_time ) for values in (0); CREATE TABLE postgres=# create table t_list_1 partition of t_list (id primary key, info , crt_time ) for values in (1); CREATE TABLE postgres=# create table t_list_2 partition of t_list (id primary key, info , crt_time ) for values in (2); CREATE TABLE postgres=# create table t_list_3 partition of t_list (id primary key, info , crt_time ) for values in (3); CREATE TABLE
创建主表
create table t_range_list(id int, info text, crt_time timestamp) partition by list ( mod(hashtext(info), 4) );
创建一级分区,主表
create table t_range_list_0 partition of t_range_list (id , info , crt_time ) for values in (0) partition by range (crt_time); create table t_range_list_1 partition of t_range_list (id , info , crt_time ) for values in (1) partition by range (crt_time); create table t_range_list_2 partition of t_range_list (id , info , crt_time ) for values in (2) partition by range (crt_time); create table t_range_list_3 partition of t_range_list (id , info , crt_time ) for values in (3) partition by range (crt_time);
创建2级分区表
create table t_range_list_0_201611 partition of t_range_list_0 (id primary key, info , crt_time ) for values from ('2016-10-01') to ('2016-11-01'); create table t_range_list_0_201612 partition of t_range_list_0 (id primary key, info , crt_time ) for values from ('2016-11-01') to ('2016-12-01'); create table t_range_list_1_201611 partition of t_range_list_1 (id primary key, info , crt_time ) for values from ('2016-10-01') to ('2016-11-01'); create table t_range_list_1_201612 partition of t_range_list_1 (id primary key, info , crt_time ) for values from ('2016-11-01') to ('2016-12-01'); create table t_range_list_2_201611 partition of t_range_list_2 (id primary key, info , crt_time ) for values from ('2016-10-01') to ('2016-11-01'); create table t_range_list_2_201612 partition of t_range_list_2 (id primary key, info , crt_time ) for values from ('2016-11-01') to ('2016-12-01'); create table t_range_list_3_201611 partition of t_range_list_3 (id primary key, info , crt_time ) for values from ('2016-10-01') to ('2016-11-01'); create table t_range_list_3_201612 partition of t_range_list_3 (id primary key, info , crt_time ) for values from ('2016-11-01') to ('2016-12-01');
1. 分区键条件建议使用同类型的常量、如果是函数则必须使用返回值与分区键类型一致,stable或immutable的函数。
postgres=# explain select * from t_range where crt_time=now()::timestamp; -- 如果需要转换,转换函数必须为immutable(常量),这样可以在进行逻辑推理前被调用 QUERY PLAN --------------------------------------------------------------------------- Append (cost=0.00..23929.55 rows=14 width=42) -> Seq Scan on t_range (cost=0.00..0.00 rows=1 width=44) Filter: (crt_time = (now())::timestamp without time zone) -> Seq Scan on t_range_0_201610 (cost=0.00..23870.00 rows=1 width=17) Filter: (crt_time = (now())::timestamp without time zone) -> Seq Scan on t_range_0_201611 (cost=0.00..29.78 rows=6 width=44) Filter: (crt_time = (now())::timestamp without time zone) -> Seq Scan on t_range_0_201612 (cost=0.00..29.78 rows=6 width=44) Filter: (crt_time = (now())::timestamp without time zone) (9 rows)
postgres=# create or replace function sysdate() returns timestamp as $$ select now()::timestamp ; $$ language sql strict immutable; CREATE FUNCTION postgres=# explain select * from t_range where crt_time=sysdate(); QUERY PLAN ---------------------------------------------------------------------------------------- Append (cost=0.00..24.12 rows=7 width=44) -> Seq Scan on t_range (cost=0.00..0.00 rows=1 width=44) Filter: (crt_time = '2016-12-16 09:58:03.246322'::timestamp without time zone) -> Seq Scan on t_range_0_201612 (cost=0.00..24.12 rows=6 width=44) Filter: (crt_time = '2016-12-16 09:58:03.246322'::timestamp without time zone) (5 rows)
返回类型一致的C代码stable函数也可以被排除,否则会被解释
postgres=# create or replace function sysdate() returns timestamp as $$ select now()::timestamp ; $$ language sql strict stable; CREATE FUNCTION postgres=# explain select * from t_range where crt_time=sysdate(); QUERY PLAN --------------------------------------------------------------------------- Append (cost=0.00..23929.55 rows=14 width=42) -> Seq Scan on t_range (cost=0.00..0.00 rows=1 width=44) Filter: (crt_time = (now())::timestamp without time zone) -> Seq Scan on t_range_0_201610 (cost=0.00..23870.00 rows=1 width=17) Filter: (crt_time = (now())::timestamp without time zone) -> Seq Scan on t_range_0_201611 (cost=0.00..29.78 rows=6 width=44) Filter: (crt_time = (now())::timestamp without time zone) -> Seq Scan on t_range_0_201612 (cost=0.00..29.78 rows=6 width=44) Filter: (crt_time = (now())::timestamp without time zone) (9 rows)
返回类型一致的C代码stable函数也可以被排除
postgres=# create table p(id int, info text, crt_time timestamptz); CREATE TABLE postgres=# create table t(like p) inherits(p); NOTICE: merging column "id" with inherited definition NOTICE: merging column "info" with inherited definition NOTICE: merging column "crt_time" with inherited definition CREATE TABLE postgres=# create table p1(like p) inherits(p); NOTICE: merging column "id" with inherited definition NOTICE: merging column "info" with inherited definition NOTICE: merging column "crt_time" with inherited definition CREATE TABLE postgres=# create table p2(like p) inherits(p); NOTICE: merging column "id" with inherited definition NOTICE: merging column "info" with inherited definition NOTICE: merging column "crt_time" with inherited definition CREATE TABLE postgres=# alter table t add constraint ck check(crt_time >='2016-11-01' and crt_time <'2016-12-01'); ALTER TABLE postgres=# alter table p1 add constraint ck check(crt_time >='2016-10-01' and crt_time <'2016-11-01'); ALTER TABLE postgres=# alter table p2 add constraint ck check(crt_time >='2016-09-01' and crt_time <'2016-10-01'); ALTER TABLE postgres=# explain select * from p where crt_time=now(); QUERY PLAN ---------------------------------------------------------------------------------------- Append (cost=0.00..0.00 rows=1 width=44) -> Seq Scan on p (cost=0.00..0.00 rows=1 width=44) Filter: (crt_time = '2016-12-16 10:02:40.483355+08'::timestamp with time zone) (3 rows)
ALTER TABLE可以将表绑定到分区表的某个分区,也可以将某个已有分区从分区表剔除。
ALTER TABLE [ IF EXISTS ] name ATTACH PARTITION partition_name FOR VALUES partition_bound_spec ALTER TABLE [ IF EXISTS ] name DETACH PARTITION partition_name
绑定时,需要指定分区键的边界(范围、或LIST值)
列名与主表一致、列类型一致、列顺序一致、NOT NULL、CHECK约束一致、不建议UNIQUE,PK,FK。
不能有任何主表非继承的CHECK约束,(建议对约束重建,不要加not inherit属性)
加入的分区需要进行全表扫描,确认所有的记录都在指定的分区键边界内。
如果要避免全表扫描,建议在执行绑定前,对这个表加入CHECK约束,确保约束可以保证记录都在边界内。 但是这种方法不适用于分区键包含表达式并且分区不允许NULL值的情况。
ATTACH PARTITION partition_name FOR VALUES partition_bound_spec This form attaches an existing table (which might itself be partitioned) as a partition of the target table using the same syntax for partition_bound_spec as CREATE TABLE. The partition bound specification must correspond to the partitioning strategy and partition key of the target table. The table to be attached must have all the same columns as the target table and no more; moreover, the column types must also match. Also, it must have all the NOT NULL and CHECK constraints of the target table. Currently UNIQUE, PRIMARY KEY, and FOREIGN KEY constraints are not considered. If any of the CHECK constraints of the table being attached is marked NO INHERIT, the command will fail; such a constraint must be recreated without the NO INHERIT clause. A full table scan is performed on the table being attached to check that no existing row in the table violates the partition constraint. It is possible to avoid this scan by adding a valid CHECK constraint to the table that would allow only the rows satisfying the desired partition constraint before running this command. It will be determined using such a constraint that the table need not be scanned to validate the partition constraint. This does not work, however, if any of the partition keys is an expression and the partition does not accept NULL values. If attaching a list partition that will not accept NULL values, also add NOT NULL constraint to the partition key column, unless it's an expression. DETACH PARTITION partition_name This form detaches specified partition of the target table. The detached partition continues to exist as a standalone table, but no longer has any ties to the table from which it was detached. All the actions except RENAME, SET TABLESPACE, SET SCHEMA, ATTACH PARTITION, and DETACH PARTITION can be combined into a list of multiple alterations to apply in parallel. For example, it is possible to add several columns and/or alter the type of several columns in a single command. This is particularly useful with large tables, since only one pass over the table need be made. You must own the table to use ALTER TABLE. To change the schema or tablespace of a table, you must also have CREATE privilege on the new schema or tablespace. To add the table as a new child of a parent table, you must own the parent table as well. Also, to attach a table as a new partition of the table, you must own the table being attached. To alter the owner, you must also be a direct or indirect member of the new owning role, and that role must have CREATE privilege on the table's schema. (These restrictions enforce that altering the owner doesn't do anything you couldn't do by dropping and recreating the table. However, a superuser can alter ownership of any table anyway.) To add a column or alter a column type or use the OF clause, you must also have USAGE privilege on the data type.
插入
postgres=# insert into t_range select generate_series(1,1000000),'test','2016-10-01'; INSERT 0 1000000 Time: 3849.514 ms (00:03.850)
直接插分区表
postgres=# truncate t_range; TRUNCATE TABLE Time: 72.181 ms postgres=# insert into t_range_0_201610 select generate_series(1,1000000),'test','2016-10-01'; INSERT 0 1000000 Time: 3587.772 ms (00:03.588)
查询,主表未消除
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t_range where id=1 and crt_time='2016-10-01'; QUERY PLAN -------------------------------------------------------------------------------------------------------------- Append (cost=0.00..2.45 rows=2 width=30) (actual time=0.037..0.038 rows=1 loops=1) Buffers: shared hit=4 -> Seq Scan on public.t_range (cost=0.00..0.00 rows=1 width=44) (actual time=0.020..0.020 rows=0 loops=1) Output: t_range.id, t_range.info, t_range.crt_time Filter: ((t_range.id = 1) AND (t_range.crt_time = '2016-10-01 00:00:00'::timestamp without time zone)) -> Index Scan using t_range_0_201610_pkey on public.t_range_0_201610 (cost=0.42..2.45 rows=1 width=17) (actual time=0.016..0.016 rows=1 loops=1) Output: t_range_0_201610.id, t_range_0_201610.info, t_range_0_201610.crt_time Index Cond: (t_range_0_201610.id = 1) Filter: (t_range_0_201610.crt_time = '2016-10-01 00:00:00'::timestamp without time zone) Buffers: shared hit=4 Planning time: 0.248 ms Execution time: 0.069 ms (12 rows)
on conflict do
postgres=# insert into t_range select generate_series(1,1000000),'test','2016-10-01' on conflict do nothing; ERROR: 0A000: ON CONFLICT clause is not supported with partitioned tables LOCATION: transformInsertStmt, analyze.c:825 Time: 0.350 ms