Postgres Get Replication Slots

admin 4/13/2022 13/04/22

#define PGGETREPLICATIONSLOTSCOLS 13. Generated on Sat Dec 12 2020 00:13:42 for PostgreSQL Source Code by 1.8.13.

Gary's presentation on logical decoding of replication slots. Brisbane PostgreSQL Users' Group (BPUG): http://www.meetup.com/Brisbane-PostgreSQL-User-Group/.
Logical replication. From Postgres version 9.4, you can set up logical replication slots at RDS PostgreSQL instance, and stream database changes. AWS Database Migration Service (AWS DMS) provides the most common use case of logical replication. Logical replication uses logical slots that remain ignorant of the recipient.

PostgreSQL 9.4 and later come with a feature called “replication slots”.They can be used to make WAL archiving and streaming replication morerobust and efficient. Read on to learn more, with hands-on examples.

WAL Files

PostgreSQL provides ACID semantics for your data by sequencing transactionsinto a long series of changes, like an ever-growing changelog. Descriptionsof these changes are encoded into “WAL files” – a series of ordered 16MB binaryfiles stored alongside other files inside the server’s data directory.

This means that if you keep around all the WAL files since the creation of thedatabase, you can recreate that database by reading and replaying the changesfrom the WAL files. This is the basis for streaming replication, WAL archivingand Point-In-Time-Recovery (PITR) in PostgreSQL.

(Note: some objects, like unlogged tables and hash indexes are not captured inWAL files – but that does not compromise the integrity of the data.)

WAL stands for Write Ahead Log, by the way.

Archiving WAL Files

Copying out WAL files as they are generated, into a “safe” location is calledWAL archiving. A “safe” location istypically an NFS-mounted disk, file servers, S3 bucket etc.

Traditionally, a user-supplied script is invoked by PostgreSQL each time aWAL file is generated. The script then copies (using rsync, scp or the like)the file to one or more safe locations.

The WAL files stored in the safe location becomes a form of incrementalbackup from which a database can be recovered up to any specified point intime.

Log Shipping

Instead of copying the WAL files to a safe location, it can instead be copiedto another machine where another PostgreSQL server is running. This servercan be configured to be in “recovery mode”, meaning all it does is to applyany new WAL files that it gets. This second server then, becomes a “standby” –it will be an identical, warm backup of the “primary” PostgreSQL server.

(The standby can also be made a “hot standby”, where it can also serve read-onlyqueries, thus being a “read replica”.)

The process of “shipping” WAL files (rather than “archiving” them) to one oremore standby servers is called “log shipping”. Typically, log shipping is alsodone using the same archive script as before – it copies newly-created WALfiles at the primary over the network to the standby servers.

Streaming Replication

Streaming replication was added to PostgreSQL inorder to improve the logshipping process. With streaming replication, the standby server will connect tothe primary server over the network (using a different protocol than thestandard client protocol). The primary server can then send WAL file contentsdirectly over this connection without having to rely on user-supplied scripts.

This also allows for better tracking. You can know for example, how far thestandby has caught up with the primary in terms of receiving, applying andreplaying the changes from the primary.

Streaming replication can be, and is popularly, used to maintain a failover-readystandby that can be promoted to the primary in case the primary fails.

WAL File Retention

Which brings us to the topic of how long the primary should retain WAL files.

Typically, you’ll have a WAL archival script that will copy WAL files to oneor more long-term archive locations. If you don’t have any standbys or otherstreaming replication clients, then basically the server can discard the WALfile once the archive script reports success. You’ll still need some recentWAL files for crash recovery (data from recent WAL files get replayed duringcrash recovery).

(BTW: PostgreSQL does not actually discard the file, it renames and recyclesthem.)

If you have standbys though, there is a problem – you need to keep aroundWAL files long enough for as long as your slowest standby needs them. If thestandby that was taken down for a while comes back online and asks the primaryfor a WAL file that the primary no longer has, then the replication fails.

The logical solution is to make the primary to keep track of how far behind thestandby is, and to not delete WAL files that any standbys still need.And this feature is called replication slots.

pgDash
Know More
Measure and track every aspect of PostgreSQL

Replication Slots

(Note: there are “physical” and “logical” replication slots. We cover only“physical” replication slots in this article.)

Replication slots are uniquely identified by names. Each slot is associatedwith:

the oldest WAL required by the consumer of the slot - WAL files later thanthis are not deleted during checkpoints
the oldest transaction ID required to be retained by the consumer of the slot -rows needed by any transactions later than this are not deleted by vacuum

Let’s try it out. First, we need a new database to experiment with:

Make sure that data/pg_hba.conf allows normal and replication connections:

(This is about as insecure as you can get, but works great for experimenting!)Let’s start the server now:

Creating a Replication Slot

Now that we have a running PostgreSQL server, let’s create a replication slot.We can do this using the pg_create_physical_replication_slot function.

Let’s also run pgbench for some traffic. Make a database first:

Then initialize and run pgbench:

Now that we have a pgbench busy in the background, we can see that the serveris generating WAL files:

Receiving WALs Using a Slot

We can receive these WAL files using the pg_receivewal command (on versionsbefore 10, this is pg_receivexlog). Let’s run it now, and ask it to use the“slot1” that we created earlier:

We created a directory called “wals” and asked pg_receivewal to fetch and storethe WALs into this directory. Since this is the first time we’re using the slot,pg_receivewal starts fetching from the WAL location at the time it was started – in this case from 0/17000000. When it has fetched 17, 18 completely and ispartway through 19, here is how the “wals” directory looks:

There is a catalog view called pg_replication_slots which gives youinformation about these slots:

PostgreSQL will not remove WAL files prior to “restart_lsn”, even at the riskof filling up the disk.

Let’s stop pg_receivewal for a while now, and let pgbench do more transactions:

More pgbench:

After the pgbench, the current WAL has crept up from 19 to 2B:

Postgres Get Replication Slots Software

while our slot remains at 19. Let’s see what happens when we restartpg_receivewal:

The server had been saving up WAL files from 19 onwards for slot1’s sake, whichit serves up immediately on the consumer’s startup, and the consumer thencontinues to wait for more. The server’s WAL directory (data/pg_wal in v10,data/pg_xlog in earlier versions) can be checked to verify whether the oldfiles were retained or deleted as intended.

Postgres Get Replication Slots Free

WAL Archiving with Slots

What we just did was in fact WAL archiving – saving a copy of every WALgenerated by the server. We did not use the traditional “archive_command”method, but the set of WAL files that we have are just as good for restorationand PITR.

Streaming Replication with Slots

Let’s try doing streaming replication using slots. First, let’s create anotherslot:

We added a “true” parameter to the function this time – this sets the“restart_lsn” of the slot to the current REDO LSN. If we takea “pg_basebackup” at this point, the data files will be guaranteed to containchanges up to REDO LSN. Having the slot’s “restart_lsn” set to this value willguarantee that our new standby, bootstrapped from a pg_basebackup will beable to fetch all changes until it is ready.

Let’s get the backup next. We’ll use the “-S / –slot” parameter to supply theslot name to the pg_basebackup command:

Because we created the slot earlier, changes made between the time the slot wascreated, and the completion of the pg_basebackup command, will be includedin the newly created “data2” directory. pg_basebackup informs us that it waswritten WAL 32 partially, and pg_replication_slots confirms that the restart_lsnhas advanced in sync:

The recovery.conf for the standby should contain the slot name (we made thepg_basebackup write this using the -R flag):

Postgres Get Replication Slots Key

The “primary_slot_name” make the standby use that slot for it’s streamingreplication. Let’s start the standby now:

We hit a checkpoint in between the backup and the start, so the WAL file hascrept up to 33.

Examining Replication Status

Postgres Get Replication Slots Games

We now have two active replication slot consumers (pg_receivewal and thestandby). We can check the replication status using thepgmetrics tool:

Where to go from here

Replication in PosgreSQL represents a whole lot of features, knobs and leversaccumulated into a stable codebase over more than a decade. Understandably, ittakes a lot of reading and experimenting to get to any level of comfort with it.

Read more about replication and standby servers in the docs here.

We didn’t cover temporary physical replication slots,a feature added in v10. pg_basebackup can also make use of such slots bydefault.

We also didn’t cover logical replication slots or logical decoding.

Monitoring PostgreSQL With OpsDash

With our own product,OpsDash, you can quickly start monitoring yourPostgreSQL servers, and get instant insight into key performance and healthmetrics including replication stats.

Here’s a default, pre-configured dashboard for PostgreSQL.

OpsDash strives to save you the tedious work of setting up a useful dashboard.The metrics you see here were carefully chosen to represent the most relevanthealth and performance indicators for a typical PostgreSQL instance.

OpsDash understands the streaming replicationfeature of PostgreSQL and displays per-slave replication status on the master:

The replication lag (as a length of time) is also visible on a slave’s dashboard.In the graph below (part of the dashboard of a slave), we can see that the slavecould not catch up to the master’s changes for a while.