Introduction
TimescaleDB is a time-series database built on top of PostgreSQL. It is designed to handle large volumes of time-stamped data and provide efficient and scalable query performance for time-series data.
The main difference between TimescaleDB and regular PostgreSQL is that TimescaleDB is optimized for storing and querying time-series data. At the same time, PostgreSQL is a general-purpose database that can handle various types of data. TimescaleDB extends PostgreSQL to provide additional features and capabilities specifically for time-series data management, such as automatic time partitioning, optimized indexing, and compression.
TimescaleDB uses a distributed hypertable architecture that partitions data based on time intervals, enabling efficient querying of large volumes of data over time. It also provides advanced analytics and visualization capabilities for time-series data, including continuous aggregates and window functions.
In this post, we’ll walk through a general overview of time series data, how TimescaleDB turns PostgreSQL into a time series database, and how to set up and perform various tasks.
Tessell for PostgreSQL supports TimescaleDB in all configurations, whether using a Single instance or High Availability setup.
Timescale Use Cases
As human beings, we produce data as we move and operate through time. Therefore, everything we do in our business and personal lives creates time series data.
1. Internet of Things (IoT): IoT is an obvious example of time series data in practice. Everything from mobile devices, home appliances, thermostats, and security cameras to automobiles can continuously collect, store, use, and transmit data. With the proliferation of IoT devices, there is a growing need to capture and analyze time-series data generated by these devices. TimescaleDB is an ideal solution for storing and analyzing IoT data, such as sensor data, device logs, and telemetry data.
2. Financial Services: Financial services firms often deal with large volumes of time-series data, such as stock prices, trading data, and transaction data. TimescaleDB can help these firms store and analyze this data more efficiently and accurately, enabling better decision-making.
3. DevOps and Monitoring: DevOps teams and system administrators often need to monitor and analyze time-series data related to system performance, network traffic, and application metrics. TimescaleDB can help to store and analyze this data in real-time, enabling faster and more accurate troubleshooting.
4. Digital Marketing: Digital marketing firms often need to analyze large volumes of time-series data related to user behavior, website traffic, and ad campaigns. TimescaleDB can help to store and analyze this data more efficiently and accurately, enabling better campaign optimization.
5. Energy and Utilities: Energy and utility companies often need to analyze time-series data related to power consumption, network outages, and equipment performance. TimescaleDB can help to store and analyze this data more efficiently, enabling better predictive maintenance and outage prevention.
TimescaleDB is useful in any scenario where large volumes of time-series data need to be stored and analyzed efficiently and accurately.
Feature & Benefits of TimescaleDB
1. Time partitioning: TimescaleDB automatically partitions data by time, which means that data is stored in separate partitions based on time intervals. This enables more efficient querying and faster analysis of time-series data.
2. Advanced indexing: TimescaleDB provides advanced indexing features that enable fast and efficient querying of time-series data. TimescaleDB uses multi-dimensional indexing to optimize queries that involve both time and other dimensions.
3. Compression: TimescaleDB provides built-in compression for time-series data, which reduces storage requirements and improves query performance. TimescaleDB uses a combination of run-length encoding, delta encoding, and bit-packing to achieve high levels of compression.
4. Continuous aggregates: TimescaleDB provides continuous aggregate functions that enable real-time analytics of time-series data. This enables users to see trends and patterns in real-time, which can be useful for proactive decision-making.
5. Window functions: TimescaleDB provides window functions that enable more advanced analytics of time-series data. Window functions allow users to calculate aggregates over a sliding window of time, which can be useful for identifying trends and patterns in time-series data.
So how does TimescaleDB’s architecture update PostgreSQL’s capabilities?
Best of PostgreSQL with TimescaleDB.
The best thing about TimescaleDB being a PostgreSQL extension is that backup and streaming replication automatically apply to it: pg_basebackup will be able to backup data stored in TimescaleDB. Similarly, restore and point-in-time recovery also automatically apply.
The same goes for synchronous and asynchronous streaming replication. Therefore, Tessell PostgreSQL with standby instances provides read scalability and high availability and would do so for TimescaleDB as well.
The above is key, especially in an environment where popular time series databases have chosen the open-core model, making features that are necessary for production environments only available under proprietary licenses. So, you continue to use the best of PostgreSQL without any additional overheads.
Now that there is an alternative, let’s check it out.
TimescaleDB: Hypertables
Essentially, you create a table in TimescaleDB just as you would in vanilla PostgreSQL. The only requirement is that the table has a timestamp (or date) column. (You can get a more in-depth overview of TimescaleDB’s Architecture here.)
Once this table is created, the built-in helper function, create_hypertable(), performs all the necessary tasks to create the hypertable and its child objects, e.g. partitions and chunks. The hypertable holds the information about its child objects, e.g. partitions and key spaces, within partitions called chunks.
Data ingested into a time series table ultimately ends up in chunks, its final resting place.
Chunks can be managed (resized, dropped, etc) from the command line. Dropping chunks to purge old data is much more efficient than the usual alternatives. But, the true elegance of TimescaleDB lies in the fact that it is a PostgreSQL extension.
How to use TimescaleDB with Tessell for PostgreSQL
Tessell for PostgreSQL supports Timescale as a first-class integration, to create a PostgreSQL database login to the Tessell console or You can create a Tessell for PostgreSQL database in minutes for free on AWS or Azure.
To provision a PostgreSQL database in Tessell. Go to Provisioning.
- Select the required version of PostgreSQL
- Select the choice of cloud and region.
- Select the required shape/configuration needed for the database instance.
Follow the subsequent steps on the console to provision the database. Once the database is created use the choice of client to connect to the database.
Connect to the database tdb (sample database)
To use timescaleDB in an existing database run the following command.
Now that we are all set, let’s explore the capabilities of timescaleDB with a use case.
A TimescaleDB use case with Tessell PostgreSQL
Let’s walk through a use case of TimescaleDB within Tessell for PostgreSQL by examining solar power plant generation data. This dataset is a time-series dataset having a sampling frequency of 15 minutes i.e. the time-series data from the power plant is generated every 15 mins.
This data has been gathered at two solar power plants in India over a 34 day period. It has two pairs of files - each pair has one power generation dataset and one sensor readings dataset. The power generation datasets are gathered at the inverter level - each inverter has multiple lines of solar panels attached to it. The sensor data is gathered at a plant level - a single array of sensors optimally placed at the plant.
To showcase the capabilities of TimescaleDB, we’ll perform the following tasks for this use case:
- Create the schema in TimescaleDB
- Ingest data
- Query data
- Purge old data
- Compression
1. Creating Schema
- First, create a new database.
- Now, create a TimescaleDB extension in the created database.
- Create a table for storing power plant generation data.
- Create the corresponding hypertable in TimescaleDB. Also create chunks to store 1 day’s worth of data (this is specified in the chunk_time_intervalparameter). The interval maps well to the generation data we will be ingesting and analyzing.
2. Ingesting data
Let’s Ingest data into the table.
Since the data is split into multiple files for multiple plants, used a bash script to load each CSV file using psql.
After data is ingested into the table, the partition and chunk information becomes evident.
The query plan for a simple count(*) of the records in the table after all the data is ingested looks like this:
*As you can see, the query needs to scan each and every chunk to resolve the query.
3. Querying data
The optimal query in TimescaleDB hits a single chunk within a hypertable.
With our data and corresponding chunk configuration (remember, we configured a chunk to contain a day’s worth of data), the most efficient query would be one that includes a specific date in the WHERE clause. Here’s an example of such a query and its corresponding query plan.
*As can be seen above, one (and only one) chunk is ever considered for the query.
For range scans on time, multiple chunks will be considered for the query. But, here’s where the PostgreSQL parallel query scan comes into play and provides big benefits.
TimescaleDB also has other built-in time functions that are more flexible and convenient to use, such as time_bucket() which provides for arbitrary time intervals as compared to PostgreSQL’s built-in date_trunc() function.
4. Purging old data
When dealing with time series data, it is convenient to have a mechanism to purge old data without impacting incoming data. This means options like DELETE FROM Table WHERE time > ‘data’ are problematic since they may involve intrusive locking and therefore impact ingest performance.
In TimescaleDB, you can drop a chunk that contains old data, and the best part is, that it does not impact ongoing ingest. Here’s an example, where we drop chunks containing data older than a specific date.
5. Compression
Data can be compressed to reduce the amount of storage required and increase the speed of some queries. This is a cornerstone feature of Timescale. When new data is added to your database, it is in the form of uncompressed rows. Timescale uses a built-in job scheduler to convert this data to the form of compressed columns. This occurs across chunks of Timescale hypertables.
To enable compression on an existing hypertable use the following command.
Add a compression policy to compress chunks that are older than seven days:
These storage savings directly translate into saving costs.
Check out the real-world use cases where compression had greatly reduced the storage requirements and in turn the cost.
Conclusion
In summary, TimescaleDB provides a specialized solution for managing and analyzing time-series data, while regular PostgreSQL is a general-purpose database that can handle various types of data.
When starting with TimescaleDB on Tessell for PostgreSQL, it is best to understand your time series data, the retention period, and most importantly what kind of queries you will run against it. These will help you determine the chunk interval for the data which closely correlates to ingest performance, query performance, and the ability to easily purge old data.
TimescaleDB scales very effectively to store time series data compared to vanilla PostgreSQL. This, along with PostgreSQL features such as backup and high availability, renders it a powerful open-source alternative to other time series databases.
The example shown in this post will help you get started quickly with TimescaleDB in Tessell PostgreSQL. To take advantage of its ability to be deployed as a high availability time series database, enable it within a Tessell for PostgreSQL.
And if you’re looking for an enterprise-grade production PostgreSQL database for your Time series workloads, try Tessell for PostgreSQL.