PostgreSQL 16 has been released, introducing numerous major features that enhance the database engine's performance and offer new capabilities to developers and administrators. This blog post will explore these important updates.
In the first part, we'll focus on the developer-friendly features of PostgreSQL 16. The following part, set to be published in the coming weeks, will delve into the other major features.
Here, we discuss major enhancements, including the introduction of new SQL/JSON standards, improvements in parallel joins and incremental sorts, the creation of custom collation rules, and other developer-friendly features.
Effective Parallel Joins
In the latest release of PostgreSQL, version 16, one of the major enhancements introduced is the support for FULL OUTER JOINS in parallel hash joins. This is a significant improvement compared to the functionality in earlier versions of the database software.
In previous PostgreSQL versions, the task of performing hash joins was allocated to a single worker. This often led to bottlenecks and inefficiencies, particularly when dealing with large datasets. However, in this current PostgreSQL 16 version, the process of hash join is now distributed and parallelized.
Consider the below quick example, where we are trying to demonstrate the FULL OUTER JOIN, which fetches all the non-matching records from both joining tables.
As shown above, we have a total of 320000 records in the goods table. Now, let's simulate a FULL OUTER JOIN on the same table to fetch all the records twice using the following query.
Here, we're using the FULL OUTER JOIN on the same table. However, the join condition compares all the product names with product names plus an 'x' character. This method retrieves all the records twice.
Now, let’s see the execution plan of this query in PostgreSQL 16.
From the above plan, parallel workers are launched to perform the hash join, introducing the node Parallel Hash Full Join in this release. If we were to execute the same query in previous versions, we wouldn't encounter the Parallel Hash Full Join. Instead, we would see a single Hash Full Join.
In version 16, parallelism extends to support the Parallel Hash Right Join too. Previous versions of PostgreSQL already supported Parallel Hash Left Join. This hash join parallelism significantly boosts performance, leading to improved response times when joining large result sets and retrieving unmatched records.
Previous versions of PostgreSQL provide extensive support for storing and handling JSON documents. They also offer a rich set of developer functions for navigating JSON paths and performing CRUD operations on the documents. However, the latest release of PostgreSQL now supports SQL/JSON standard functions. Other database engines, including Oracle and MySQL, support the SQL/JSON standards using functions such as json_array() and json_object(). Similarly, PostgreSQL 16 has introduced these JSON constructors that adhere to the SQL/JSON standards. These JSON constructor functions enable us to easily create JSON objects from other result sets.
Let’s begin with some simple examples, where we try to demonstrate constructing and validating the JSON objects. Now, let's create a JSON object using the json_object constructor.
Now, let's exclude products with a NULL price by specifying ABSENT ON NULL in json_object.
By using this constructor, we specify an inline validation to the keys like allow only UNIQUE keys and throw exception if there are any duplicate keys found. Consider the below example, where we are constructing the JSON with duplicate keys, which throws exception.
By using this json_object constructor, we can also created nested json_objects as like below.
The above one is an example to just demonstrate the flexibility of creating nested json objects, by using this new constructors.
The json_array constructor which is introduced, like the json_object, helps to build a JSON array using the provided values.
Here is an example where we construct a JSON array using JSON objects. We can also create a JSON array object with all possible type values and exclude NULL values from the list.
In addition to the constructors, PostgreSQL 16 also provides an IS JSON predicate. This predicate checks if a given value is a valid JSON entity. With IS JSON, we can confirm that the input is one of the following: VALUE, SCALAR, OBJECT, or ARRAY. Moreover, we can verify if the JSON has a UNIQUE key set or a non-`UNIQUE` key set.
Consider the following simple JSON document, where we would be performing a few set of validations.
The above example demonstrates a basic validation where we verify whether the given JSON is valid or not. Since it's a valid object, the output is true.
Now, let's take a closer look at the internal objects, such as age, which is JSON VALUE type.
Similarly, let’s evaluate the children object which is of type JSON ARRAY.
The IS JSON also validates the entire JSON object to determine whether it contains unique keys or not. Below is a quick example demonstrating this behavior.
In the above example, we have used the name key twice, which caused it to return the value false.
In PostgreSQL 13, a feature called incremental sort was introduced. This approach to sorting data allows it to be sorted in batches or groups, instead of processing it all at once.
In PostgreSQL 16, this feature was further enhanced to support DISTINCT/Unique sort operations. This means that, when performing data aggregate operations in PostgreSQL 16, we can now get the benefit of having incremental sort.
In general, one query feature gets the major advantage of having the incremental sort. That is, the LIMIT of the data result. Consider the below general query, which gets the top 100 costly products from the goods table.
As shown above, we explicitly disabled the enable_incremental_sort feature to illustrate the benefits of the incremental_sort. The query above performs a sequential scan on the table, then uses top-N heap sort to retrieve the top 100 costly products. This query takes approximately 72ms.
Now, let's execute the same query with increment_sort enabled and observe the response time.
From the results, it's evident that the query response time has significantly decreased from 72ms to 0.145ms.
This improvement is due to the index on the product_name column. This column serves as a Presorted Key for incremental sort, which only returned 149 rows to be sorted into 4 groups. This response is far more efficient than the previous query execution, which required a full table scan to retrieve all records before performing the top-N heap sort.
The incremental sort feature is incredibly useful when processing large amounts of data. It eliminates the need to load and sort the entire data set at once by allowing you to process the data in groups. However, in earlier versions of PostgreSQL, this feature was limited to only ORDER BY and WINDOW functions. Now, it has been expanded to support DISTINCT, where PostgreSQL internally performs a sort to eliminate duplicate values from the list.
Consider the following example, where we add the DISTINCT clause to the previous query. Notice how the incremental_sort operation is selected and it creates 100 incremental sort groups to eliminate duplicates from the result. If we increase the dataLIMIT from 100 to 1000then, we would be having 1000 groups in the incremental sorts.
Custom Collation Rules
With the release of PostgreSQL 16, a significant new feature is the ability to define custom collation rules. These rules can be used to customize the way data is sorted in database. This functionality is particularly useful in handling cases where the conventional sorting techniques do not suffice. For instance, consider a scenario where you have a set of words whose pronunciation order is not the same as their alphabetical order.
Traditional sorting methods would fail to consider this nuance. However, with the custom collation rules in PostgreSQL 16, you can impose a specific pronunciation-based sorting order.
This new feature adds another layer of flexibility and control in managing and manipulating your data. To demonstrate this feature, let's consider the following set of English words. Their alphabetical order differs from the order of their pronunciation.
The words "Hour", "Island", "Knot", "Pneumonia", "Psychology", and "Wrist" are alphabetically sorted. However, when we consider their pronunciation, which includes silent letters, a different letter might appear as the first character. If we wish to sort these words based on pronunciation, this is achievable in PostgreSQL 16 through custom collation rules. While previous versions required custom logic to handle such cases, PostgreSQL 16 allows for easy configuration.
Consider the below demonstrations which solves the above problem with collation custom rules.
As seen in the previous output, the words are organized in ascending alphabetical order. If we want to sort these words based on pronunciation, we need to create a custom collation and configure the rules accordingly. Now, let’s create the collation with custom rules.
As seen above, in the rules section, we stated that Is is less than Ho. This is due to the general English pronunciation where Island is pronounced as Iland and Hour as our. According to alphabetical order, Iland is considered less than our. Now, let's run the same query with these custom collation rules and see whether the we are able to sort the words based on pronunciation.
From the output, we see that the list of words are now sorted based on pronunciation.
In addition to the major features mentioned above, PostgreSQL 16 offers a range of additional functionalities designed for developers. These improvements not only simplify the programming process but also offer more adaptable and efficient solutions to different programming challenges.
This function serves an important purpose: it randomly shuffles the elements within an array. It does this in a completely arbitrary order, guaranteeing that the end result will be unpredictable and indeed, quite random. The beauty of this function lies in its versatility. It can accept any array data type, making it incredibly flexible and adaptable to various use cases.
Once the function has completed its shuffling process, it then returns a new array. This new array contains the same elements as the original one, but with the order of these elements thoroughly randomized. This simple yet powerful function is an essential tool when randomization is required.
From the above example, it's evident that the array_shuffle() function produces different results for the same input. Due to its versatility, we can use it to shuffle any array. For instance, consider the following array of arrays.
As demonstrated above, the array_shuffle() function generates a random array for the same input. This allows us to randomize the data group that we need to process in the application, if necessary.
This particular function operates by randomly selecting a predetermined number of elements from an array. The function accepts two arguments. The first argument that the function accepts is an array. This array can be of any length, and it is from this array that the elements will be selected. The second argument accepted by the function is an integer. This integer represents the number of elements that the function will select from the array. Thus, for instance, if the function is given an array of ten elements and the integer 3, it will randomly select three elements from the array of ten. Consider the example below.
As demonstrated above, the array_sample() function generates a different sample value for the same input. This allows us to randomly select a sample value from the array in our dataset.
Since the array_sample() function can accept any array element, we can also use it to randomly select an array from an array of arrays. Consider the following example.
The any_value() in PostgreSQL 16 provides a handy utility by returning a single value from a specified group of values. This function doesn't guarantee to return a specific value, rather it arbitrarily picks any value from the group. This can be particularly beneficial in scenarios where you don't need a specific value, but any representative value from a group will suffice.
Consider the following example, which demonstrates this function's behaviour.
From the above output, if you see any_value() returning a non-null value of 1, try to run the same process with different values as shown below.
Now, let's execute the operation again with all NULL values and observe the results.
From the above results, it's evident that any_value() returns a non-null value from the group. This function is similar to the COALESCE function but operates on a group of values. It is particularly useful for replacing aggregate functions such as MIN, MAX, AVG when the application doesn't require the aggregated output value, but instead needs any non-null entry from that group.
PostgreSQL 16 introduces several new features and improvements. It now supports parallel hash right join, enhancing performance when joining large result sets. SQL/JSON standard functions are now supported, allowing easy creation of JSON objects from other result sets. Incremental sort, introduced in PostgreSQL 13, has been expanded to support DISTINCT/Unique sort operations, significantly reducing response times. Custom collation rules can be defined, offering flexibility in data sorting. New functions include array_shuffle for randomizing array elements, array_sample for randomly selecting array elements, and any_value for returning a single value from a group of values.
PostgreSQL 16 is now supported on Tessell.
Watch out for Part -2 of PostgreSQL 16.
Schedule a demo today to try PostgreSQL 16 & more on Tessell.