44

Does anyone have experience of using PostgreSQL for an OLAP setup, using cubes against the database etc. Having come across a number of idiosyncracies when using MySQL for OLAP, are there reasons in favour of using PostgreSQL instead (assuming that I want to go the open source route)?

Tomasz Jakub Rup
  • 10,502
  • 7
  • 48
  • 49
davek
  • 22,499
  • 9
  • 75
  • 95
  • 3
    We are using PostgreSQL 9.1 and Saiku. All cube schemas are defined in Schema Workbench (Pentaho). So far, I do not see any issues with such configuration. We do have fact tables with approx. 1,000,000 rows and all works well without any pre-aggregation. – Tomas Greif Nov 13 '12 at 18:39

2 Answers2

29

There are a number of data warehousing software vendors that are based on Postgresql (and contribute OLAP related changes back to core fairly regularly). Check out https://greenplum.org/. You'll find that PG works a lot better (for nearly any workload, OLAP especially) than MySQL. Greenplum and other similar solutions should work a bit better than PG depending on your data sets and use cases.

Dave Jarvis
  • 30,436
  • 41
  • 178
  • 315
Trey
  • 11,032
  • 1
  • 23
  • 21
4

PGSQL is much better suited for Data Warehousing compared to MySQL. We had thought initially to go with MySQL, but it performs poorly in aggregations if data grows to a few million rows. PGSQL performs almost 20 times faster in caparison with MySQL for 20 million records for a single fact table on same hardware setup. If for some reason you choose to go with MySQL, then you should use MyISAM storage engine for fact tables rather then InnoDB; you will see slightly better performance.

Nathan Tuggy
  • 2,237
  • 27
  • 30
  • 38
Red Boy
  • 5,429
  • 3
  • 28
  • 41
  • 1
    Just a note. AFAIK, MyISAM engine does not support ACID transactions. Large size dataset is very likely to be broken like PostgreSQL which is always ACID transactional. – eonil May 09 '18 at 07:49
  • 4
    ACID is mostly irrelevant in OLAP systems because they are read-only environments, with the exception of SCDs. The ETL process must ensure data integrity. This assumes you have implemented a star schema and Kimball's enterprise DW bus architecture. – Clint Pachl Jul 24 '20 at 10:05