PostgreSQL 10.0 内置分区表

作者: digoal

日期: 2016-12-15

标签: PostgreSQL , 10.0 , 分区表 , partitiion table , range , list


背景

PostgreSQL 和它的LOGO大象一样,给人非常强大的安全感。

就拿它的Feature来说,一个大的feature要打磨很多年才能正式的合并到master分支。

比如并行计算的特性,从9.4就开始准备,加入了work process和dynamic shared memory的功能,奠定了多进程并行执行的基础。

一直到9.6,经历了3年的开发,大量的测试,终于RELEASE了。

今天要说一说它的分区表,经历了几年的酝酿,终于在10.0加入到master了。(PG的企业版本EDB很早就支持分区表了)

如果你现在需要这个功能,可以使用postgrespro的插件, pg_pathman,否则就等10.0吧,明年9月份左右release。

PostgreSQL 9.5+ 高效分区表实现 - pg_pathman

PostgreSQL 10.0 分区表的设计

PostgreSQL的分区表依旧使用了继承的特性,好消息是, 终于不需要手工写规则了。

在使用方面,我们需要先创建主表,然后创建分区(即子表)。老顽固啊,就是要让你记住PG是有继承的概念的吗?

目前支持range、list分区(10.0 release时应该会支持更多的方法),分区列的类型必须支持btree索引接口(几乎涵盖所有类型, 后面会说到检查方法)。

语法

1. 创建主表

[ PARTITION BY { RANGE | LIST } ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [, ... ] ) ]  

2. 创建分区

PARTITION OF parent_table [ (  
  { column_name [ column_constraint [ ... ] ]  
    | table_constraint }  
    [, ... ]  
) ] FOR VALUES partition_bound_spec  

and partition_bound_spec is:  

{ IN ( expression [, ...] )   -- list分区  
  | FROM ( { expression | UNBOUNDED } [, ...] ) TO ( { expression | UNBOUNDED } [, ...] ) }  
                                          -- range分区, unbounded表示无限小或无限大  

语法解释

1. 创建主表

partition by 指定分区表的类型range或list

指定分区列,或表达式作为分区键。

range分区表键:支持指定多列、或多表达式,支持混合(键,非表达式中的列,会自动添加not null的约束)

list分区表键:支持单个列、或单个表达式

分区键必须有对应的btree索引方法的ops(可以查看系统表得到)

select typname from pg_type where oid in (select opcintype from pg_opclass); 

主表不会有任何数据,数据会根据分区规则进入对应的分区表

如果插入数据时,分区键的值没有匹配的分区,会报错

不支持全局的unique, primary key, exclude, foreign key约束,只能在对应的分区建立这些约束

PARTITION BY { RANGE | LIST } ( { column_name | ( expression ) } [ opclass ] [, ...] )  
The optional PARTITION BY clause specifies a strategy of partitioning the table. The table thus created is called a partitioned table. The parenthesized list of columns or expressions forms the partition key for the table. When using range partitioning, the partition key can include multiple columns or expressions, but for list partitioning, the partition key must consist of a single column or expression. If no btree operator class is specified when creating a partitioned table, the default btree operator class for the datatype will be used. If there is none, an error will be reported. A partitioned table is divided into sub-tables (called partitions), which are created using separate CREATE TABLE commands. The partitioned table is itself empty. A data row inserted into the table is routed to a partition based on the value of columns or expressions in the partition key. If no existing partition matches the values in the new row, an error will be reported. Partitioned tables do not support UNIQUE, PRIMARY KEY, EXCLUDE, or FOREIGN KEY constraints; however, you can define these constraints on individual partitions. When using range partitioning, a NOT NULL constraint is added to each non-expression column in the partition key.

2. 创建分区

主表创建好之后,需要再创建分区(为了体现继承么?就不能一次干掉?)

创建分区表需要指定它的主表是哪个?

指定分区键的属性(范围,LIST值),范围,LIST不能有重叠(这个很好理解,否则重叠部分的数据到底插到哪个分区去呢?)

分区表和主表的 列数量,定义 必须完全一致,(包括OID也必须一致,要么都有,要么都没有)

可以为分区表的列单独增加Default值,或约束。

用户还可以对分区表增加表级约束

如果新增的分区表check约束,名字与主表的约束名一致,则约束内容必须与主表一致

当用户往主表插入数据库时,记录被自动路由到对应的分区,如果没有合适的分区,则报错

如果更新数据,并且更新后的KEY导致数据需要移动到另一分区,则会报错,(意思是分区键可以更新,但是不支持更新后的数据移出到别的分区表)

修改主表的字段名,字段类型时,会自动同时修改所有的分区

TRUNCATE 主表时,会清除所有继承表分区的记录(如果有多级分区,则会一直清除到所有的直接和间接继承的分区)

如果要清除单个分区,请对分区进行操作

如果要删除分区表,可以使用DROP TABLE的DDL语句,注意这个操作会对主表也加access exclusive lock。

PARTITION OF parent_table FOR VALUES partition_bound_spec  

partition_bound_spec is:    

{ IN ( expression [, ...] ) |  
  FROM ( { expression | UNBOUNDED } [, ...] ) TO ( { expression | UNBOUNDED } [, ...] ) }  
Creates the table as partition of the specified parent table. The partition bound specification must correspond to the partitioning method and partition key of the parent table, and must not overlap with any existing partition of that parent. A partition cannot have columns other than those inherited from the parent. That includes the oid column, which can be specified using the WITH (OIDS) clause. Defaults and constraints can optionally be specified for each of the inherited columns. One can also specify table constraints in addition to those inherited from the parent. If a check constraint with the name matching one of the parent's constraint is specified, it is merged with the latter, provided the specified condition is same. Rows inserted into a partitioned table will be automatically routed to the correct partition. If no suitable partition exists, an error will occur. Also, if updating a row in a given partition causes it to move to another partition due to the new partition key, an error will occur. A partition must have the same column names and types as the table of which it is a partition. Therefore, modifications to the column names or types of the partitioned table will automatically propagate to all children, as will operations such as TRUNCATE which normally affect a table and all of its inheritance children. It is also possible to TRUNCATE a partition individually, just as for an inheritance child. Note that dropping a partition with DROP TABLE requires taking an ACCESS EXCLUSIVE lock on the parent table.

例子

测试版本编译

wget https://git.postgresql.org/gitweb/?p=postgresql.git;a=snapshot;h=55caaaeba877eac1feb6481fb413fa04ae9046ac;sf=tgz

tar -zxvf postgresql-55caaae.tar.gz  

cd postgresql-55caaae  

./configure --prefix=/home/digoal/pgsql10.0  

make world -j 32  

make install-world  

export PGPORT=5299  
export PGDATA=/disk1/pgdata/pg_root10  
export LANG=en_US.utf8  
export PGHOME=/home/digoal/pgsql10.0  
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH 
export DATE=`date +"%Y%m%d%H%M"`  
export PATH=$PGHOME/bin:$PATH:.  
export MANPATH=$PGHOME/share/man:$MANPATH  
export PGHOST=$PGDATA  
export PGUSER=postgres  
export PGDATABASE=postgres  


initdb -D $PGDATA -U postgres -E SQL_ASCII --locale=C  

vi $PGDATA/postgresql.conf  
cat postgresql.conf|grep "^[a-z]"|awk -F "#" '{print $1}'  

listen_addresses = '0.0.0.0'  
port = 5299  
max_connections = 500  
superuser_reserved_connections = 13  
unix_socket_directories = '.'  
shared_buffers = 16GB  
maintenance_work_mem = 1024MB  
dynamic_shared_memory_type = posix  
bgwriter_delay = 10ms  
bgwriter_lru_maxpages = 1000  
bgwriter_lru_multiplier = 5.0  
bgwriter_flush_after = 0  
max_parallel_workers_per_gather = 0  
max_parallel_workers = 0              
old_snapshot_threshold = -1  
backend_flush_after = 0  
wal_buffers = 512MB  
wal_writer_delay = 10ms  
wal_writer_flush_after = 0  
checkpoint_timeout = 55min  
max_wal_size = 128GB  
min_wal_size = 8GB  
checkpoint_completion_target = 0.5  
checkpoint_flush_after = 0  
random_page_cost = 1.0  
effective_cache_size = 400GB  
log_destination = 'csvlog'  
logging_collector = on  
log_truncate_on_rotation = on  
log_checkpoints = on  
log_connections = on  
log_disconnections = on  
log_error_verbosity = verbose    
log_timezone = 'PRC'  
log_autovacuum_min_duration = 0  
datestyle = 'iso, mdy'  
timezone = 'PRC'  
lc_messages = 'C'  
lc_monetary = 'C'  
lc_numeric = 'C'  
lc_time = 'C'  
default_text_search_config = 'pg_catalog.english'  

pg_ctl start  

查找可用的opclass

由于分区表的键值列必须支持btree索引,所以建议确认一下。

创建主表时,需要指定ops,如果没有指定,会自动选择类型默认的btree ops.

postgres=# select opcintype::regtype,opcname from pg_opclass ;  
          opcintype          |        opcname           
-----------------------------+------------------------  
 abstime                     | abstime_ops  
 anyarray                    | array_ops  
 anyarray                    | array_ops  
 bit                         | bit_ops  
 boolean                     | bool_ops  
 character                   | bpchar_ops  
 character                   | bpchar_ops  
 。。。。。。  

分区表

1. range 分区表

例子为按时间字段分区的分区表,用户在创建主表时,指定分区键,创建分区时,指定每个分区的范围。

1.1 主表

range分区表,支持混合式的分区键(多列,多表达式,或者混合式)

和前面说的一样,不能使用全局PK

postgres=# create table t_range(id int primary key, info text, crt_time timestamp) partition by 
range (crt_time, mod(hashtext(info), 4));  
ERROR:  0A000: primary key constraints are not supported on partitioned tables  
LINE 1: create table t_range(id int primary key, info text, crt_time...  
                                    ^  
LOCATION:  transformColumnDefinition, parse_utilcmd.c:620  


postgres=# create table t_range(id int, info text, crt_time timestamp) partition by range (crt_time);  
CREATE TABLE  

1.2 分区

postgres=# create table t_range_0_201610 partition of t_range (id  primary key, info , crt_time ) for values 
from ('2016-10-01') to ('2016-11-01');  -- >= 20161001 and < 20161101  
CREATE TABLE  
postgres=# create table t_range_0_201611 partition of t_range (id  primary key, info , crt_time ) for values 
from ('2016-11-01') to ('2016-12-01');  -- >= 20161101 and < 20161201  
CREATE TABLE  

甚至你可以将分区表放到不同的schema下面

postgres=# create schema x;
CREATE SCHEMA

postgres=# create table x.t_range_0_201612 partition of t_range (id  primary key, info , crt_time ) for values from ('2016-12-01') to ('2017-01-01'); 
CREATE TABLE

postgres=# \d+ t_range
                                             Table "public.t_range"
  Column  |            Type             | Collation | Nullable | Default | Storage  | Stats target | Description 
----------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
 id       | integer                     |           |          |         | plain    |              | 
 info     | text                        |           |          |         | extended |              | 
 crt_time | timestamp without time zone |           | not null |         | plain    |              | 
Partition key: RANGE (crt_time)
Partitions: t_range_0_201610 FOR VALUES FROM ('2016-10-01 00:00:00') TO ('2016-11-01 00:00:00'),
            t_range_0_201611 FOR VALUES FROM ('2016-11-01 00:00:00') TO ('2016-12-01 00:00:00'),
            x.t_range_0_201612 FOR VALUES FROM ('2016-12-01 00:00:00') TO ('2017-01-01 00:00:00')

2. list 分区表

2.1 主表

postgres=# create table t_list(id int, info text, crt_time timestamp) partition by 
list ( mod(hashtext(info), 4) );  
CREATE TABLE  

2.2 分区

postgres=# create table t_list_0 partition of t_list (id  primary key, info , crt_time ) for values in (0);  
CREATE TABLE  
postgres=# create table t_list_1 partition of t_list (id  primary key, info , crt_time ) for values in (1);  
CREATE TABLE  
postgres=# create table t_list_2 partition of t_list (id  primary key, info , crt_time ) for values in (2);  
CREATE TABLE  
postgres=# create table t_list_3 partition of t_list (id  primary key, info , crt_time ) for values in (3);  
CREATE TABLE  

多级分区

创建主表

create table t_range_list(id int, info text, crt_time timestamp) partition by list ( mod(hashtext(info), 4) );  

创建一级分区,主表

create table t_range_list_0 partition of t_range_list (id  , info , crt_time ) for values in (0) partition by 
range (crt_time);  
create table t_range_list_1 partition of t_range_list (id  , info , crt_time ) for values in (1) partition by 
range (crt_time);  
create table t_range_list_2 partition of t_range_list (id  , info , crt_time ) for values in (2) partition by 
range (crt_time);  
create table t_range_list_3 partition of t_range_list (id  , info , crt_time ) for values in (3) partition by 
range (crt_time);  

创建2级分区表

create table t_range_list_0_201611 partition of t_range_list_0 (id  primary key, info , crt_time ) for values 
from ('2016-10-01') to ('2016-11-01');  
create table t_range_list_0_201612 partition of t_range_list_0 (id  primary key, info , crt_time ) for values 
from ('2016-11-01') to ('2016-12-01');  

create table t_range_list_1_201611 partition of t_range_list_1 (id  primary key, info , crt_time ) for values 
from ('2016-10-01') to ('2016-11-01');  
create table t_range_list_1_201612 partition of t_range_list_1 (id  primary key, info , crt_time ) for values 
from ('2016-11-01') to ('2016-12-01');  

create table t_range_list_2_201611 partition of t_range_list_2 (id  primary key, info , crt_time ) for values 
from ('2016-10-01') to ('2016-11-01');  
create table t_range_list_2_201612 partition of t_range_list_2 (id  primary key, info , crt_time ) for values 
from ('2016-11-01') to ('2016-12-01');  

create table t_range_list_3_201611 partition of t_range_list_3 (id  primary key, info , crt_time ) for values 
from ('2016-10-01') to ('2016-11-01');  
create table t_range_list_3_201612 partition of t_range_list_3 (id  primary key, info , crt_time ) for values 
from ('2016-11-01') to ('2016-12-01');  

执行计划

1. 分区键条件建议使用同类型的常量、如果是函数则必须使用返回值与分区键类型一致,stable或immutable的函数。

postgres=# explain select * from t_range where crt_time=now()::timestamp;  
 -- 如果需要转换,转换函数必须为immutable(常量),这样可以在进行逻辑推理前被调用  
                                QUERY PLAN                            
---------------------------------------------------------------------------
 Append  (cost=0.00..23929.55 rows=14 width=42)
   ->  Seq Scan on t_range  (cost=0.00..0.00 rows=1 width=44)
         Filter: (crt_time = (now())::timestamp without time zone)
   ->  Seq Scan on t_range_0_201610  (cost=0.00..23870.00 rows=1 width=17)
         Filter: (crt_time = (now())::timestamp without time zone)
   ->  Seq Scan on t_range_0_201611  (cost=0.00..29.78 rows=6 width=44)
         Filter: (crt_time = (now())::timestamp without time zone)
   ->  Seq Scan on t_range_0_201612  (cost=0.00..29.78 rows=6 width=44)
         Filter: (crt_time = (now())::timestamp without time zone)
(9 rows)


postgres=# create or replace function sysdate() returns timestamp as $$
  select now()::timestamp ;
$$ language sql strict immutable;
CREATE FUNCTION

postgres=# explain select * from t_range where crt_time=sysdate();
                                       QUERY PLAN                                      
----------------------------------------------------------------------------------------
 Append  (cost=0.00..24.12 rows=7 width=44)
   ->  Seq Scan on t_range  (cost=0.00..0.00 rows=1 width=44)
         Filter: (crt_time = '2016-12-16 09:58:03.246322'::timestamp without time zone)
   ->  Seq Scan on t_range_0_201612  (cost=0.00..24.12 rows=6 width=44)
         Filter: (crt_time = '2016-12-16 09:58:03.246322'::timestamp without time zone)
(5 rows)

返回类型一致的C代码stable函数也可以被排除,否则会被解释

postgres=# create or replace function sysdate() returns timestamp as $$
  select now()::timestamp ;
$$ language sql strict stable;
CREATE FUNCTION

postgres=# explain select * from t_range where crt_time=sysdate();
                                QUERY PLAN                                 
---------------------------------------------------------------------------
 Append  (cost=0.00..23929.55 rows=14 width=42)
   ->  Seq Scan on t_range  (cost=0.00..0.00 rows=1 width=44)
         Filter: (crt_time = (now())::timestamp without time zone)
   ->  Seq Scan on t_range_0_201610  (cost=0.00..23870.00 rows=1 width=17)
         Filter: (crt_time = (now())::timestamp without time zone)
   ->  Seq Scan on t_range_0_201611  (cost=0.00..29.78 rows=6 width=44)
         Filter: (crt_time = (now())::timestamp without time zone)
   ->  Seq Scan on t_range_0_201612  (cost=0.00..29.78 rows=6 width=44)
         Filter: (crt_time = (now())::timestamp without time zone)
(9 rows)

返回类型一致的C代码stable函数也可以被排除

postgres=# create table p(id int, info text, crt_time timestamptz);
CREATE TABLE
postgres=# create table t(like p) inherits(p);
NOTICE:  merging column "id" with inherited definition
NOTICE:  merging column "info" with inherited definition
NOTICE:  merging column "crt_time" with inherited definition
CREATE TABLE
postgres=# create table p1(like p) inherits(p);
NOTICE:  merging column "id" with inherited definition
NOTICE:  merging column "info" with inherited definition
NOTICE:  merging column "crt_time" with inherited definition
CREATE TABLE
postgres=# create table p2(like p) inherits(p);
NOTICE:  merging column "id" with inherited definition
NOTICE:  merging column "info" with inherited definition
NOTICE:  merging column "crt_time" with inherited definition
CREATE TABLE
postgres=# alter table t add constraint ck check(crt_time >='2016-11-01' and crt_time <'2016-12-01');
ALTER TABLE
postgres=# alter table p1 add constraint ck check(crt_time >='2016-10-01' and crt_time <'2016-11-01');
ALTER TABLE
postgres=# alter table p2 add constraint ck check(crt_time >='2016-09-01' and crt_time <'2016-10-01');
ALTER TABLE
postgres=# explain select * from p where crt_time=now();
                                       QUERY PLAN                                       
----------------------------------------------------------------------------------------
 Append  (cost=0.00..0.00 rows=1 width=44)
   ->  Seq Scan on p  (cost=0.00..0.00 rows=1 width=44)
         Filter: (crt_time = '2016-12-16 10:02:40.483355+08'::timestamp with time zone)
(3 rows)

绑定分区,解绑分区

ALTER TABLE可以将表绑定到分区表的某个分区,也可以将某个已有分区从分区表剔除。

ALTER TABLE [ IF EXISTS ] name
    ATTACH PARTITION partition_name FOR VALUES partition_bound_spec
ALTER TABLE [ IF EXISTS ] name
    DETACH PARTITION partition_name

解释

绑定时,需要指定分区键的边界(范围、或LIST值)

列名与主表一致、列类型一致、列顺序一致、NOT NULL、CHECK约束一致、不建议UNIQUE,PK,FK。

不能有任何主表非继承的CHECK约束,(建议对约束重建,不要加not inherit属性)

加入的分区需要进行全表扫描,确认所有的记录都在指定的分区键边界内。

如果要避免全表扫描,建议在执行绑定前,对这个表加入CHECK约束,确保约束可以保证记录都在边界内。 但是这种方法不适用于分区键包含表达式并且分区不允许NULL值的情况。

ATTACH PARTITION partition_name FOR VALUES partition_bound_spec This form attaches an existing table (which might itself be partitioned) as a partition of the target table using the same syntax for partition_bound_spec as CREATE TABLE. The partition bound specification must correspond to the partitioning strategy and partition key of the target table. The table to be attached must have all the same columns as the target table and no more; moreover, the column types must also match. Also, it must have all the NOT NULL and CHECK constraints of the target table. Currently UNIQUE, PRIMARY KEY, and FOREIGN KEY constraints are not considered. If any of the CHECK constraints of the table being attached is marked NO INHERIT, the command will fail; such a constraint must be recreated without the NO INHERIT clause. A full table scan is performed on the table being attached to check that no existing row in the table violates the partition constraint. It is possible to avoid this scan by adding a valid CHECK constraint to the table that would allow only the rows satisfying the desired partition constraint before running this command. It will be determined using such a constraint that the table need not be scanned to validate the partition constraint. This does not work, however, if any of the partition keys is an expression and the partition does not accept NULL values. If attaching a list partition that will not accept NULL values, also add NOT NULL constraint to the partition key column, unless it's an expression. DETACH PARTITION partition_name This form detaches specified partition of the target table. The detached partition continues to exist as a standalone table, but no longer has any ties to the table from which it was detached. All the actions except RENAME, SET TABLESPACE, SET SCHEMA, ATTACH PARTITION, and DETACH PARTITION can be combined into a list of multiple alterations to apply in parallel. For example, it is possible to add several columns and/or alter the type of several columns in a single command. This is particularly useful with large tables, since only one pass over the table need be made. You must own the table to use ALTER TABLE. To change the schema or tablespace of a table, you must also have CREATE privilege on the new schema or tablespace. To add the table as a new child of a parent table, you must own the parent table as well. Also, to attach a table as a new partition of the table, you must own the table being attached. To alter the owner, you must also be a direct or indirect member of the new owning role, and that role must have CREATE privilege on the table's schema. (These restrictions enforce that altering the owner doesn't do anything you couldn't do by dropping and recreating the table. However, a superuser can alter ownership of any table anyway.) To add a column or alter a column type or use the OF clause, you must also have USAGE privilege on the data type.

性能测试

range 分区表

插入

postgres=# insert into t_range select generate_series(1,1000000),'test','2016-10-01';  
INSERT 0 1000000  
Time: 3849.514 ms (00:03.850)  

直接插分区表

postgres=# truncate t_range;  
TRUNCATE TABLE  
Time: 72.181 ms  
postgres=# insert into t_range_0_201610 select generate_series(1,1000000),'test','2016-10-01';  
INSERT 0 1000000  
Time: 3587.772 ms (00:03.588)  

查询,主表未消除

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t_range where id=1 and crt_time='2016-10-01';
  
                                                QUERY PLAN                                                          
--------------------------------------------------------------------------------------------------------------  
 Append  (cost=0.00..2.45 rows=2 width=30) (actual time=0.037..0.038 rows=1 loops=1)  
   Buffers: shared hit=4  
   ->  Seq Scan on public.t_range  (cost=0.00..0.00 rows=1 width=44) (actual time=0.020..0.020 rows=0 loops=1)  
         Output: t_range.id, t_range.info, t_range.crt_time  
         Filter: ((t_range.id = 1) AND (t_range.crt_time = '2016-10-01 00:00:00'::timestamp without time zone))  
   ->  Index Scan using t_range_0_201610_pkey on public.t_range_0_201610  (cost=0.42..2.45 rows=1 width=17) 
                                                                  (actual time=0.016..0.016 rows=1 loops=1)  
         Output: t_range_0_201610.id, t_range_0_201610.info, t_range_0_201610.crt_time  
         Index Cond: (t_range_0_201610.id = 1)  
         Filter: (t_range_0_201610.crt_time = '2016-10-01 00:00:00'::timestamp without time zone)  
         Buffers: shared hit=4  
 Planning time: 0.248 ms  
 Execution time: 0.069 ms  
(12 rows)  

on conflict do

postgres=# insert into t_range select generate_series(1,1000000),'test','2016-10-01' on conflict do nothing;

ERROR:  0A000: ON CONFLICT clause is not supported with partitioned tables  
LOCATION:  transformInsertStmt, analyze.c:825  
Time: 0.350 ms  

参考

  1. https://www.postgresql.org/docs/devel/static/sql-createtable.html
  2. 《PostgreSQL 9.5+ 高效分区表实现 - pg_pathman》
  3. 《PostgreSQL 优化器逻辑推理能力 源码解析》

pg_bot_banner.jpg

请在登录后发表评论,否则无法保存。
© 2010 PostgreSQL中文社区