TodoBI - Business Intelligence, Big Data, ML y AI TodoBI - Business Intelligence, Big Data, ML y AI

Citusdata, PostgreSQL Columnar Store for Analytic Workloads

Columnar stores bring notable benefits for analytic workloads where data is loaded in batches
Open source columnar store extension for PostgreSQL and share it with the community! Columnar stores bring notable benefits for analytic workloads, where data is loaded in batches.
This columnar store extension uses the Optimized Row Columnar (ORC) format for its data layout. ORC improves upon the RCFile format developed at Facebook, and brings the following benefits:
  • Compression: Reduces in-memory and on-disk data size by 2-4x. Can be extended to support different codecs.
  • Column projections: Only reads column data relevant to the query. Improves performance for I/O bound queries.
  • Skip indexes: Stores min/max statistics for row groups, and uses them to skip over unrelated rows.
Further, we used the Postgres foreign data wrapper APIs and type representations with this extension. This brings:
  • Support for 40+ Postgres data types. The user can also create new types and use them.
  • Statistics collection. PostgreSQL's query optimizer uses these stats to evaluate different query plans and pick the best one.
  • Simple setup. Create foreign table and copy data. Run SQL.

It's worth noting that the columnar store extension is self-contained. If you're a PostgreSQL user, you can get the entire source code and build using the instructions  on our GitHub page (link is external) . You can even join columnar store and regular Postgres tables in the same SQL query.