This post is the first in our series on PostgreSQL, a highly customizable and standards compliant open source object-relational database system.
The modern world isn’t just a sea of data, it’s also awash in data storage options. It used to be a company just had to choose between one of the “Big 3” (DB2, MS-SQL, or Oracle), or MySQL or PostgreSQL. Today there’s a few more relational databases, a bunch of NqSQL databases, as well as specialty databases such as graph databases or time-series databases. Many companies now run multiple different database technologies.
What if there was one solution that satisfied all these needs?
The hidden costs of data
There’s one thing that’s important to understand: data is not easy. If something claims to make it easy, either it’s glossing over details or just ignoring them. A choice that makes one data problem easy makes other problems harder:
- Lack of schema makes ingestion easy and analysis difficult
- Strong OLTP performance reduces OLAP performance
- Consistency, Availability, Performance: Pick 2 (the CAP theorem)
Understanding how your data technologies work under the hood is critical for making informed decisions and designing an appropriate data architecture.
Here’s your magic bullet!
Just kidding. Sort of.
Postgres has a lot of features that most users aren’t aware of. Unfortunately, there’s no
make_data_easy = true setting, but there are a lot of features that make various problems easier. In fact, it’s best not to think of Postgres as a database, because it goes far beyond that. Think of it as a data platform.
Features you’ve never heard of
This is certainly not a complete list, but I think it’s some of the more overlooked features.
Arrays and Composite types
Tables are not the only way to store sets of information in Postgres, nor are they the only way to pass information around your systems. Arrays and composite types work together the same way that lists and dictionaries/hashes do in other languages: they allow you to create a single object that represents an arbitrarily complex set of data. That complex object can be passed to and from functions, easily converted to/from JSON (and XML), and even stored as a field in a table.
Can you imagine writing an application in a language that didn’t support lists and dictionaries? Of course not. So why wouldn’t you extend that notion to your database? A quick (still work in progress!) example of this idea is https://github.com/decibel/cat_snap[cat_snap], a tool that defines a composite type that will store the complete contents of the catalog and statistics for an entire Postgres database, as a single object. A single SQL command will output all that information in a format that can be read directly into the composite type. Imagine how easy that makes it to gather statistics on a bunch of databases.
There’s another powerful use for arrays and composites: acting like a column store. Most column stores are very picky about how and when you can write to them. By representing a series of data points as an array instead of row-by-row, you can gain the benefits of a column store with more flexibility in writing data.
New base types
I’ve never seen another database that makes it as easy to create a brand new data type as Postgres. This makes it easy to represent, store and index information inside Postgres that meshes seamlessly with how that data is handled elsewhere.
Similar to packages in other languages, Postgres extensions allow you to create a package of Postgres objects for easy use by others. The Postgres Extension Network is the main repository for extensions, and it currently contains over 200 extensions.
Multiple procedural languages
Support for running many different programming languages inside the database is a killer feature for advanced data science and analytics. It allows you to perform complex pre-processing and filtering of data on the same machine that houses the data. Use python, R, Lua or any number of other languages.
Postgres has over 100 foreign data wrappers that allow you to connect to other data sources and interact with them as if they were Postgres tables. Thanks to the Multicorn extension it is very easy to create a new foreign data wrapper.
What many people don’t realize is that, thanks to the robust procedural language support, you don’t have to use a foreign data wrapper to speak to a remote data source. For example, it would be very difficult to interface with Amazon AWS via a foreign data wrapper (though S3 is a good fit, and there is a foreign data wrapper for it). But since there is a python AWS API and because you can run python code inside Postgres, you can interface directly with AWS via Postgres.
Do even more complex things
Postgres has a number of internal “hooks” that allow extensions to directly alter how certain behaviors work in Postgres, without modifying any Postgres source code. One example of that is the citus extension, created by Citus Data. This extension brings horizontal scale to Postgres. 100+ TB databases are no problem with this extension.
As you can see, there are many features in Postgres that make complex data challenges easier to handle. More features are added constantly (some in Postgres itself, some as extensions).
It even comes with a world class relational database engine.