PostgreSQL Replica Identity Explained

Replica Identity is crucial for logical replication success

Introduction: DIY Logical Replication

The concept of Replica Identity serves logical replication.

Logical replication fundamentally works by decoding row-level changes (INSERT/UPDATE/DELETE) on published tables and applying them to subscribers.

The mechanism somewhat resembles row-level triggers, where changes are processed row by row after transaction completion.

Suppose you need to implement logical replication manually using triggers, copying changes from table A to table B. The trigger function would typically look like this:

-- Notification trigger
CREATE OR REPLACE FUNCTION replicate_change() RETURNS TRIGGER AS $$
BEGIN
  IF    (TG_OP = 'INSERT') THEN 
  -- INSERT INTO tbl_b VALUES (NEW.col);
  ELSIF (TG_OP = 'DELETE') THEN 
	-- DELETE tbl_b WHERE id = OLD.id;
  ELSIF (TG_OP = 'UPDATE') THEN 
	-- UPDATE tbl_b SET col = NEW.col,... WHERE id = OLD.id;
  END IF;
END; $$ LANGUAGE plpgsql;

The trigger provides two variables: OLD and NEW, containing the record’s previous and new values respectively.

  • INSERT operations only have the NEW variable - we simply insert it into the target table.
  • DELETE operations only have the OLD variable - we delete the record by ID from the target table.
  • UPDATE operations have both OLD and NEW variables - we locate the record in table B using OLD.id and update it with NEW values.

This trigger-based “logical replication” achieves our goal. Similarly, in logical replication, when table A has a primary key column id, deleting a record (e.g., id = 1) only requires sending id = 1 to the subscriber, not the entire deleted tuple. Here, the primary key column id serves as the replica identity.

However, this example assumes that tables A and B have identical schemas with a primary key named id.

For a production-grade logical replication solution (PostgreSQL 10.0+), this assumption is unreasonable. The system cannot require users to always have primary keys or to name them id.

Thus, the concept of Replica Identity was introduced. Replica Identity generalizes and abstracts the OLD.id assumption, telling the logical replication system which information can uniquely identify a record.

Replica Identity

For logical replication, INSERT events don’t require special handling, but DELETE|UPDATE operations must provide a way to identify rows - the Replica Identity. A replica identity is a set of columns that can uniquely identify a record. Conceptually, this is similar to a primary key column set, though columns from non-null unique indexes (candidate keys) can serve the same purpose.

A table included in a logical replication publication must have a Replica Identity configured. This allows the subscriber to locate and update the correct rows for UPDATE and DELETE operations. By default, primary keys and non-null unique indexes can serve as replica identities.

Note that Replica Identity is distinct from a table’s primary key or non-null unique indexes. Replica Identity is a table property that specifies which information is used as an identifier in logical replication records for subscriber-side record location and change application.

As described in the PostgreSQL 13 official documentation, there are four configuration modes for table Replica Identity:

  • Default mode: The standard mode for non-system tables. Uses primary key columns if available, otherwise falls back to full mode.
  • Index mode: Uses columns from a specific qualified index as the identity
  • Full mode: Uses all columns in the record as the replica identity (similar to all columns collectively forming a primary key)
  • Nothing mode: Records no replica identity, meaning UPDATE|DELETE operations cannot be replicated to subscribers.

Querying Replica Identity

A table’s Replica Identity can be checked via pg_class.relreplident.

This is a character-type “enum” indicating which columns are used to assemble the “replica identity”: d = default, f = all columns, i = specific index, n = no replica identity.

To check if a table has usable replica identity index constraints:

SELECT quote_ident(nspname) || '.' || quote_ident(relname) AS name, con.ri AS keys,
       CASE relreplident WHEN 'd' THEN 'default' WHEN 'n' THEN 'nothing' WHEN 'f' THEN 'full' WHEN 'i' THEN 'index' END AS replica_identity
FROM pg_class c JOIN pg_namespace n ON c.relnamespace = n.oid, LATERAL (SELECT array_agg(contype) AS ri FROM pg_constraint WHERE conrelid = c.oid) con
WHERE relkind = 'r' AND nspname NOT IN ('pg_catalog', 'information_schema', 'monitor', 'repack', 'pg_toast')
ORDER BY 2,3;

Configuring Replica Identity

Table replica identity can be modified using ALTER TABLE:

ALTER TABLE tbl REPLICA IDENTITY { DEFAULT | USING INDEX index_name | FULL | NOTHING };
-- Four specific forms
ALTER TABLE t_normal REPLICA IDENTITY DEFAULT;                    -- Use primary key, or FULL if none exists
ALTER TABLE t_normal REPLICA IDENTITY FULL;                       -- Use entire row as identity
ALTER TABLE t_normal REPLICA IDENTITY USING INDEX t_normal_v_key; -- Use unique index
ALTER TABLE t_normal REPLICA IDENTITY NOTHING;                    -- No replica identity

Replica Identity Examples

Let’s demonstrate replica identity effects with a concrete example:

CREATE TABLE test(k text primary key, v int not null unique);

We have a table test with two columns k and v.

INSERT INTO test VALUES('Alice', '1'), ('Bob', '2');
UPDATE test SET v = '3' WHERE k = 'Alice';    -- update Alice value to 3
UPDATE test SET k = 'Oscar' WHERE k = 'Bob';  -- rename Bob to Oscar
DELETE FROM test WHERE k = 'Alice';           -- delete Alice

The corresponding logical decoding results:

table public.test: INSERT: k[text]:'Alice' v[integer]:1
table public.test: INSERT: k[text]:'Bob' v[integer]:2
table public.test: UPDATE: k[text]:'Alice' v[integer]:3
table public.test: UPDATE: old-key: k[text]:'Bob' new-tuple: k[text]:'Oscar' v[integer]:2
table public.test: DELETE: k[text]:'Alice'

By default, PostgreSQL uses the table’s primary key as the replica identity, so UPDATE|DELETE operations locate records using the k column.

We can manually change the replica identity to use the non-null unique column v:

ALTER TABLE test REPLICA IDENTITY USING INDEX test_v_key; -- Replica identity based on UNIQUE index

The same changes now produce these logical decoding results, with v appearing as the identity in all UPDATE|DELETE events:

table public.test: INSERT: k[text]:'Alice' v[integer]:1
table public.test: INSERT: k[text]:'Bob' v[integer]:2
table public.test: UPDATE: old-key: v[integer]:1 new-tuple: k[text]:'Alice' v[integer]:3
table public.test: UPDATE: k[text]:'Oscar' v[integer]:2
table public.test: DELETE: v[integer]:3

Using full identity mode:

ALTER TABLE test REPLICA IDENTITY FULL; -- Table test now uses all columns as replica identity

Here, both k and v serve as identity identifiers in UPDATE|DELETE logs. This is a fallback option for tables without primary keys.

table public.test: INSERT: k[text]:'Alice' v[integer]:1
table public.test: INSERT: k[text]:'Bob' v[integer]:2
table public.test: UPDATE: old-key: k[text]:'Alice' v[integer]:1 new-tuple: k[text]:'Alice' v[integer]:3
table public.test: UPDATE: old-key: k[text]:'Bob' v[integer]:2 new-tuple: k[text]:'Oscar' v[integer]:2
table public.test: DELETE: k[text]:'Alice' v[integer]:3

Using nothing mode:

ALTER TABLE test REPLICA IDENTITY NOTHING; -- Table test now has no replica identity

The logical decoding records show UPDATE operations with only new records (no unique identity from old records), and DELETE operations with no information at all:

table public.test: INSERT: k[text]:'Alice' v[integer]:1
table public.test: INSERT: k[text]:'Bob' v[integer]:2
table public.test: UPDATE: k[text]:'Alice' v[integer]:3
table public.test: UPDATE: k[text]:'Oscar' v[integer]:2
table public.test: DELETE: (no-tuple-data)

Such logical change logs are useless for subscribers. In practice, DELETE|UPDATE operations on tables without replica identity in logical replication will fail immediately.

Replica Identity Details

Table replica identity configuration and table indexes are relatively orthogonal factors.

While various combinations are possible, only three scenarios are practical in real-world usage:

  • Table has a primary key, using default default replica identity
  • Table has no primary key but has non-null unique indexes, explicitly configured with index replica identity
  • Table has neither primary key nor non-null unique indexes, explicitly configured with full replica identity (very inefficient, only as a fallback)
  • All other scenarios cannot properly support logical replication functionality
Replica Identity Mode\Table Constraints Primary Key(p) Non-null Unique Index(u) Neither(n)
default valid x x
index x valid x
full ineff ineff ineff
nothing x x x

Let’s examine some edge cases.

Rebuilding Primary Keys

Suppose we want to rebuild a table’s primary key index to reclaim space due to index bloat:

CREATE TABLE test(k text primary key, v int);
CREATE UNIQUE INDEX test_pkey2 ON test(k);
BEGIN;
ALTER TABLE test DROP CONSTRAINT test_pkey;
ALTER TABLE test ADD PRIMARY KEY USING INDEX test_pkey2;
COMMIT;

In default mode, rebuilding and replacing the primary key constraint and index does not affect the replica identity.

Rebuilding Unique Indexes

Suppose we want to rebuild a non-null unique index to reclaim space:

CREATE TABLE test(k text, v int not null unique);
ALTER TABLE test REPLICA IDENTITY USING INDEX test_v_key;
CREATE UNIQUE INDEX test_v_key2 ON test(v);
-- Replace old Unique index with new test_v_key2 index
BEGIN;
ALTER TABLE test ADD UNIQUE USING INDEX test_v_key2;
ALTER TABLE test DROP CONSTRAINT test_v_key;
COMMIT;

Unlike default mode, in index mode, replica identity is bound to the specific index:

                                    Table "public.test"
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description
--------+---------+-----------+----------+---------+----------+--------------+-------------
 k      | text    |           |          |         | extended |              |
 v      | integer |           | not null |         | plain    |              |
Indexes:
    "test_v_key" UNIQUE CONSTRAINT, btree (v) REPLICA IDENTITY
    "test_v_key2" UNIQUE CONSTRAINT, btree (v)

This means that replacing a UNIQUE index through substitution will cause replica identity loss.

There are two solutions:

  1. Use REINDEX INDEX (CONCURRENTLY) to rebuild the index without losing replica identity information.
  2. Refresh the table’s default replica identity when replacing the index:
BEGIN;
ALTER TABLE test ADD UNIQUE USING INDEX test_v_key2;
ALTER TABLE test REPLICA IDENTITY USING INDEX test_v_key2;
ALTER TABLE test DROP CONSTRAINT test_v_key;
COMMIT;

Note: Removing an index used as an identity. Even though the table configuration still shows index mode, the effect is the same as nothing. So don’t casually modify identity indexes.

Using Unqualified Indexes as Replica Identity

A replica identity requires a unique, non-deferrable, table-wide index on non-nullable columns.

The classic examples are primary key indexes and single-column non-null indexes declared via col type NOT NULL UNIQUE.

The NOT NULL requirement exists because NULL values cannot be compared for equality. A table can have multiple records with NULL values in a UNIQUE column, so nullable columns cannot uniquely identify records. Attempting to use a regular UNIQUE index (without non-null constraints) as a replica identity will fail:

[42809] ERROR: index "t_normal_v_key" cannot be used as replica identity because column "v" is nullable

Using FULL Replica Identity

If no replica identity exists, you can set it to FULL, using the entire row as the replica identity.

Using FULL mode replica identity is inefficient, so this configuration should only be used as a fallback or for very small tables. Each row modification requires a full table scan on the subscriber, which can easily overwhelm the subscriber.

FULL Mode Limitations

Using FULL mode replica identity has another limitation: the subscriber’s table replica identity columns must either match the publisher’s or be fewer in number. Otherwise, correctness cannot be guaranteed. Consider this example:

If both publisher and subscriber tables use FULL replica identity, but the subscriber’s table has an extra column (yes, logical replication allows subscriber tables to have columns not present in the publisher), then the subscriber’s table replica identity includes more columns than the publisher’s. If the publisher deletes a record (f1=a, f2=a), this could delete two records on the subscriber that match the identity equality condition.

     (Publication)       ------>           (Subscription)
|--- f1 ---|--- f2 ---|          |--- f1 ---|--- f2 ---|--- f3 ---|
|    a     |     a    |          |    a     |     a    |     b    |
                                 |    a     |     a    |     c    |

How FULL Mode Handles Duplicate Rows

PostgreSQL’s logical replication can “correctly” handle scenarios with identical rows in FULL mode. Consider this poorly designed table with multiple identical records:

CREATE TABLE shitty_table(
	 f1  TEXT,
	 f2  TEXT,
	 f3  TEXT
);
INSERT INTO shitty_table VALUES ('a', 'a', 'a'), ('a', 'a', 'a'), ('a', 'a', 'a');

In FULL mode, the entire row serves as the replica identity. Suppose we cheat using ctid scanning to delete one of the three identical records:

# SELECT ctid,* FROM shitty_table;
 ctid  | a | b | c
-------+---+---+---
 (0,1) | a | a | a
 (0,2) | a | a | a
 (0,3) | a | a | a

# DELETE FROM shitty_table WHERE ctid = '(0,1)';
DELETE 1

# SELECT ctid,* FROM shitty_table;
 ctid  | a | b | c
-------+---+---+---
 (0,2) | a | a | a
 (0,3) | a | a | a

Logically, using the entire row as an identity, the subscriber would execute:

DELETE FROM shitty_table WHERE f1 = 'a' AND f2 = 'a' AND f3 = 'a'

This would appear to delete all three records. However, because PostgreSQL’s change records operate at the row level, this change only affects the first matching record. Thus, the subscriber’s behavior is to delete one of the three rows, maintaining logical equivalence with the publisher.

Last modified 2025-04-04: bump to v3.4.1 (98648b1)