Don't Update! Rollback Issued on Release Day: PostgreSQL Faces a Major Setback

As the old saying goes, never release code on Friday. Although PostgreSQL’s recent minor release carefully avoided a Friday launch, it still gave the community a full week of extra work — PostgreSQL will release an unscheduled emergency update next Thursday: PostgreSQL 17.2, 16.6, 15.10, 14.15, 13.20, and even 12.22 for the just-EOLed PG 12.

This is the first time in a decade that such a situation has occurred: on the very day of PostgreSQL’s release, the new version was pulled due to issues discovered by the community. There are two reasons for this emergency release. First, to fix the CVE-2024-10978 security vulnerability, which isn’t a major concern. The real problem is that the new PostgreSQL minor version modified its ABI, causing extensions that depend on ABI stability — like TimescaleDB — to crash.

The issue of PostgreSQL minor version ABI compatibility was actually raised by Yuri back in June at PGConf 2024. During the extensions summit and his talk “Pushing boundaries with extensions, for extension”, he brought up this concern, but it didn’t receive much attention. Now it has exploded spectacularly, and I imagine Yuri is probably shrugging his shoulders saying: “Told you so.”

In short, the PostgreSQL community strongly recommends that users do not upgrade PostgreSQL in the coming week. Tom Lane has proposed releasing an unscheduled emergency minor version next Thursday to roll back these changes, overwriting the older 17.1, 16.5, and so on — essentially treating the problematic versions as if they “never existed.” Consequently, Pigsty 3.1, which was scheduled for release in the next couple of days and set to use the latest PostgreSQL 17.1 by default, will also be delayed by a week.

Overall, I believe this incident will have a positive impact. First, it’s not a quality issue with the core kernel itself. Second, because it was discovered early enough — on the very day of release — and promptly halted, there was no substantial impact on users. Unlike vulnerabilities in other databases/chips/operating systems that cause widespread damage upon discovery, this was caught early. Apart from a few overzealous update enthusiasts or unfortunate new installations, there shouldn’t be much impact. This is similar to the recent xz backdoor incident, which was also discovered by PG core developer Peter during PostgreSQL testing, further highlighting the vitality and insight of the PostgreSQL ecosystem.


What Happened

On the morning of November 14th, an email appeared on the PostgreSQL Hackers mailing list mentioning that the new minor version had actually broken the ABI. This isn’t a problem for the PostgreSQL database kernel itself, but the ABI change broke the convention between the PG kernel and extension plugins, causing extensions like TimescaleDB to fail on the new PG minor version.

PostgreSQL extension plugins are provided for specific major versions on specific operating system distributions. For example, PostGIS, TimescaleDB, and Citus are built for major versions like PG 12, 13, 14, 15, 16, and 17 released each year. Extensions built for PG 16.0 are generally expected to continue working on PG 16.1, 16.2, … 16.x. This means you can perform rolling upgrades of the PG kernel’s minor versions without worrying about extension plugin issues.

However, this isn’t an explicit promise but rather an implicit community understanding — ABI belongs to internal implementation details and shouldn’t have such promises or expectations. PostgreSQL has simply performed too well in the past, and everyone has grown accustomed to this behavior, making it a default working assumption reflected in various aspects including PGDG repository package naming and installation scripts.

This time, though, PG 17.1 and the backported versions to 16-12 modified the size of an internal structure, which can cause — extensions compiled for PG 17.0 when used on 17.1 — potential conflicts resulting in illegal writes or program crashes. Note that this issue doesn’t affect users of the PostgreSQL kernel itself; PostgreSQL has internal assertions to check for such situations.

However, for users of extensions like TimescaleDB, this means if you don’t use extensions recompiled for the current minor version, you’ll face such security risks. Given the current maintenance logic of PGDG repositories, extension plugins are only compiled against the latest PG minor version when a new extension version is released.

Regarding the PostgreSQL ABI issue, Marco Slot from CrunchyData wrote a detailed thread explaining it. Available for professional readers to reference.

https://x.com/marcoslot/status/1857403646134153438


How to Avoid Such Problems

As I mentioned previously in “PG’s Ultimate Achievement: The Most Complete PG Extension Repository”, I maintain a repository of many PG extension plugins for EL and Debian/Ubuntu, covering nearly half of the extensions in the entire PG ecosystem.

The PostgreSQL ABI issue was actually mentioned by Yuri before. As long as your extension plugins are compiled for the PostgreSQL minor version you’re currently using, there won’t be any problems. That’s why I recompile and package these extension plugins whenever a new minor version is released.


Last month, I had just finished compiling all the extension plugins for 17.0, and was about to start updates for compiling the 17.1 version. It looks like that won’t be necessary now, as 17.2 will roll back the ABI changes. While this means extensions compiled on 17.0 can continue to be used, I’ll still recompile and package against PG 17.2 and other main versions after 17.2 is released.

If you’re in the habit of installing PostgreSQL and extension plugins from the internet and don’t promptly upgrade minor versions, you’ll indeed face this security risk — where your newly installed extensions aren’t compiled for your older kernel version and crash due to ABI conflicts.


To be honest, I’ve encountered this problem in the real world quite early on, which is why when developing Pigsty, an out-of-the-box PostgreSQL distribution, I chose from Day 1 to first download all necessary packages and their dependencies locally, build a local software repository, and then provide Yum/Apt repositories to all nodes that need them. This approach ensures that all nodes in the environment install the same versions, and that it’s a consistent snapshot — the extension versions match the kernel version.

Moreover, this approach achieves the requirement of “independent control,” meaning that after your deployment goes live, you won’t encounter absurd situations like — the original software source shutting down or moving, or simply the upstream repository releasing an incompatible new version or new dependency, leading to major failures when setting up new machines/instances. This means you have a complete software copy for replication/expansion, with the ability to keep your services running indefinitely without worrying about someone “truly cutting off your lifeline.”


For example, when 17.1 was recently released, RedHat updated the default version of LLVM from 17 to 18 just two days prior, and unfortunately only updated EL8 without updating EL9. If users chose to install from the internet upstream at this time, it would fail directly. After I raised this issue to Devrim, he spent two hours fixing it by adding LLVM-18 to the EL9-specific patch Fix repository.

PS: If you didn’t know about this independent repository, you’d probably continue to encounter issues even after the fix, until RedHat fixed the problem themselves. But Pigsty would handle all these dirty details for you.


Some might say they could solve such version problems using Docker, which is certainly true. However, running databases in Docker comes with other issues, and these Docker images essentially use the operating system’s package manager in their Dockerfiles to download RPM/DEB packages from official repositories. Ultimately, someone has to do this work…

Of course, adapting to different operating systems means a significant maintenance workload. For example, I maintain 143 PG extension plugins for EL and 144 for Debian, each needing to be compiled for 10 major operating system versions (EL 8/9, Ubuntu 22/24, Debian 12, five major systems, amd64 and arm64) and 6 database major versions (PG 17-12). The combination of these elements means there are nearly 10,000 packages to build/test/distribute, including twenty Rust extensions that take half an hour to compile… But honestly, since it’s all semi-automated pipeline work, changing from running once a year to once every 3 months is acceptable.


Appendix: Explanation of the ABI Issue

About the PostgreSQL extension ABI issue in the latest patch versions (17.1, 16.5, etc.)

C code in PostgreSQL extensions includes headers from PostgreSQL itself. When an extension is compiled, functions from the headers are represented as abstract symbols in the binary. These symbols are linked to actual function implementations when the extension is loaded, based on function names. This way, an extension compiled for PostgreSQL 17.0 can typically still load into PostgreSQL 17.1, as long as function names and signatures in the headers haven’t changed (i.e., the Application Binary Interface or “ABI” is stable).

Headers also declare structures (passed as pointers) to functions. Strictly speaking, structure definitions are also part of the ABI, but there are more subtleties here. After compilation, structures are primarily defined by their size and field offsets, so name changes don’t affect the ABI (though they affect the API). Size changes slightly affect the ABI. In most cases, PostgreSQL uses a macro (“makeNode”) to allocate structures on the heap, which looks at the compile-time size of the structure and initializes the bytes to 0.

The difference in 17.1 is that a new boolean was added to the ResultRelInfo structure, increasing its size. What happens next depends on who calls makeNode. If it’s code from PostgreSQL 17.1, it uses the new size. If it’s an extension compiled for 17.0, it uses the old size. When it calls PostgreSQL functions with a pointer allocated using the old size, PostgreSQL functions still assume the new size and may write beyond the allocated block. Generally, this is quite problematic. It can lead to bytes being written to unrelated memory areas or program crashes.

When running tests, PostgreSQL has internal checks (assertions) to detect this situation and throw warnings. However, PostgreSQL uses its own allocator, which always rounds up allocated bytes to powers of 2. The ResultRelInfo structure is 376 bytes (on my laptop), so it rounds up to 512 bytes, and similarly after the change (384 bytes on my laptop). Therefore, this particular structure change typically doesn’t affect allocation size. There might be uninitialized bytes, but this is usually resolved by calling InitResultRelInfo.

This issue mainly raises warnings in tests or assertion-enabled builds where extensions allocate ResultRelInfo, especially when running those tests with extension binaries compiled against older PostgreSQL versions. Unfortunately, the story doesn’t end there. TimescaleDB is a heavy user of ResultRelInfo and indeed encountered problems with the size change. For example, in one of its code paths, it needs to find an index in an array of ResultRelInfo pointers, for which it performs pointer arithmetic. This array is allocated by PostgreSQL (384 bytes), but the Timescale binary assumes 376 bytes, resulting in a meaningless number that triggers assertion failures or segfaults. https://github.com/timescale/timescaledb/blob/2.17.2/src/nodes/hypertable_modify.c#L1245…

The code here isn’t actually wrong, but the contract with PostgreSQL isn’t as expected. This is an interesting lesson for all of us. Similar issues might exist in other extensions, though not many extensions are as advanced as Timescale. Another advanced extension is Citus, but I’ve verified that Citus is safe. It does show assertion warnings. Everyone is advised to be cautious. The safest approach is to ensure extensions are compiled with headers from the PostgreSQL version you’re running.

Last modified 2025-03-22: add postgres blogs (117ac1d)