161

I want to run a small PostgreSQL database which runs in memory only, for each unit test I write. For instance:

@Before
void setUp() {
    String port = runPostgresOnRandomPort();
    connectTo("postgres://localhost:"+port+"/in_memory_db");
    // ...
}

Ideally I'll have a single postgres executable checked into the version control, which the unit test will use.

Something like HSQL, but for postgres. How can I do that?

Were can I get such a Postgres version? How can I instruct it not to use the disk?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Chi-Lan
  • 3,575
  • 3
  • 22
  • 24

10 Answers10

138

(Moving my answer from Using in-memory PostgreSQL and generalizing it):

You can't run Pg in-process, in-memory

I can't figure out how to run in-memory Postgres database for testing. Is it possible?

No, it is not possible. PostgreSQL is implemented in C and compiled to platform code. Unlike H2 or Derby you can't just load the jar and fire it up as a throwaway in-memory DB.

Its storage is filesystem based, and it doesn't have any built-in storage abstraction that would allow you to use a purely in-memory datastore. You can point it at a ramdisk, tempfs, or other ephemeral file system storage though.

Unlike SQLite, which is also written in C and compiled to platform code, PostgreSQL can't be loaded in-process either. It requires multiple processes (one per connection) because it's a multiprocessing, not a multithreading, architecture. The multiprocessing requirement means you must launch the postmaster as a standalone process.

Use throwaway containers

Since I originally wrote this the use of containers has become widespread, well understood and easy.

It should be a no-brainer to just configure a throw-away postgres instance in a Docker container for your test uses, then tear it down at the end. You can speed it up with hacks like LD_PRELOADing libeatmydata to disable that pesky "don't corrupt my data horribly on crash" feature ;).

There are a lot of wrappers to automate this for you for any test suite and language or toolchain you would like.

Alternative: preconfigure a connection

(Written before easy containerization; no longer recommended)

I suggest simply writing your tests to expect a particular hostname/username/password to work, and having the test harness CREATE DATABASE a throwaway database, then DROP DATABASE at the end of the run. Get the database connection details from a properties file, build target properties, environment variable, etc.

It's safe to use an existing PostgreSQL instance you already have databases you care about in, so long as the user you supply to your unit tests is not a superuser, only a user with CREATEDB rights. At worst you'll create performance issues in the other databases. I prefer to run a completely isolated PostgreSQL install for testing for that reason.

Instead: Launch a throwaway PostgreSQL instance for testing

Alternately, if you're really keen you could have your test harness locate the initdb and postgres binaries, run initdb to create a database, modify pg_hba.conf to trust, run postgres to start it on a random port, create a user, create a DB, and run the tests. You could even bundle the PostgreSQL binaries for multiple architectures in a jar and unpack the ones for the current architecture to a temporary directory before running the tests.

Personally I think that's a major pain that should be avoided; it's way easier to just have a test DB configured. However, it's become a little easier with the advent of include_dir support in postgresql.conf; now you can just append one line, then write a generated config file for all the rest.

Faster testing with PostgreSQL

For more information about how to safely improve the performance of PostgreSQL for testing purposes, see a detailed answer I wrote on this topic earlier: Optimise PostgreSQL for fast testing

H2's PostgreSQL dialect is not a true substitute

Some people instead use the H2 database in PostgreSQL dialect mode to run tests. I think that's almost as bad as the Rails people using SQLite for testing and PostgreSQL for production deployment.

H2 supports some PostgreSQL extensions and emulates the PostgreSQL dialect. However, it's just that - an emulation. You'll find areas where H2 accepts a query but PostgreSQL doesn't, where behaviour differs, etc. You'll also find plenty of places where PostgreSQL supports doing something that H2 just can't - like window functions, at the time of writing.

If you understand the limitations of this approach and your database access is simple, H2 might be OK. But in that case you're probably a better candidate for an ORM that abstracts the database because you're not using its interesting features anyway - and in that case, you don't have to care about database compatibility as much anymore.

Tablespaces are not the answer!

Do not use a tablespace to create an "in-memory" database. Not only is it unnecessary as it won't help performance significantly anyway, but it's also a great way to disrupt access to any other you might care about in the same PostgreSQL install. The 9.4 documentation now contains the following warning:

WARNING

Even though located outside the main PostgreSQL data directory, tablespaces are an integral part of the database cluster and cannot be treated as an autonomous collection of data files. They are dependent on metadata contained in the main data directory, and therefore cannot be attached to a different database cluster or backed up individually. Similarly, if you lose a tablespace (file deletion, disk failure, etc), the database cluster might become unreadable or unable to start. Placing a tablespace on a temporary file system like a ramdisk risks the reliability of the entire cluster.

because I noticed too many people were doing this and running into trouble.

(If you've done this you can mkdir the missing tablespace directory to get PostgreSQL to start again, then DROP the missing databases, tables etc. It's better to just not do it.)

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • 1
    I am unclear about the warning provided here. If I'm trying to run Unit Tests fast, why is there a cluster involved? Shouldn't this just be all on my local, throwaway instance of PG? If the cluster (of one) is corrupted why does that matter, I was planning to delete it anyways. – Gates VP Nov 20 '14 at 04:16
  • 1
    @GatesVP PostgreSQL uses the term "cluster" in a somewhat odd manner, to refer to the PostgreSQL instance (data directory, collection of databases, postmaster, etc). So it's not a "cluster" in the sense of "compute cluster". Yes, that's annoying, and I'd like to see that terminology change. And if it's throwaway then of course it doesn't matter, but people regularly attempt to have a throwaway in-memory *tablespace* on a PostgreSQL install that contains data they otherwise care about. That's a problem. – Craig Ringer Nov 20 '14 at 04:18
  • OK, that's both _"what I thought"_ and _"very scary"_, the RAMDrive solution clearly only belongs on a local DB that contains no useful data. But why would anyone want to run unit tests against a machine that isn't their own machine? Based on your answer, Tablespaces + RamDisk sounds perfectly legitimate for an actual Unit Test instance of PGSQL running solely on your local machine. – Gates VP Nov 20 '14 at 04:41
  • 1
    @GatesVP Some people keep things they care about on their local machine - which is fine, but it's then a bit silly to be running unit tests against the same DB install. People are silly, though. Some of them also don't keep proper backups. Wails ensue. – Craig Ringer Nov 20 '14 at 04:49
  • In any case, if you're going to go the ramdisk option you really want WAL on the ramdisk too, so you might as well `initdb` a whole new Pg install there. But really, there's little difference between a Pg that's tweaked for fast testing on normal storage (fsync=off and other data durability/safety features turned off) than running on a ramdisk, at least on Linux. – Craig Ringer Nov 20 '14 at 04:50
  • SCROLL DOWN to Rubms answer, there is now an embedded postgres library – ekcrisp Jan 03 '20 at 17:08
  • @CraigRinger - I recently answerd [this question](https://dba.stackexchange.com/questions/318417/postgresql-database-cluster-vs-one-server-with-many-databases) and I searched for the reason why PostgreSQL uses the word "cluster" in the way that it does - I speculated that it might for the same reason(s) that we use the term "relation" and "tuple" and "attribute" rather than "table", "row/record" and "column" - would it be anything to do with ancient history - a cluster meaning more than one database on the same machine - maybe this was a novelty then. Pity Michael Stonebraker isn't here? – Vérace Oct 22 '22 at 21:04
71

Or you could create a TABLESPACE in a ramfs / tempfs and create all your objects there.
I recently was pointed to an article about doing exactly that on Linux. The original link is dead. But it was archived (provided by Arsinclair):

Warning

This can endanger the integrity of your whole database cluster.
Read the added warning in the manual.
So this is only an option for expendable data.

For unit-testing it should work just fine. If you are running other databases on the same machine, be sure to use a separate database cluster (which has its own port) to be safe.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • 8
    I really think this is bad advice. Do not do this. Instead, `initdb` a new postgres instance in a tempfs or ramdisk. Do *not* use a tablespace in a tempfs etc, it's fragile and pointless. You're better off using a normal tablespace and creating `UNLOGGED` tables - it will perform similarly. And it won't address the WAL performance and fsync factors unless you take actions that will risk the integrity of the whole DB (see https://stackoverflow.com/q/9407442/398670). Don't do it. – Craig Ringer Aug 11 '17 at 01:20
  • 1
    The link in the answer is dead. But it was archived: https://web.archive.org/web/20160319031016/http://magazine.redhat.com/2007/12/12/tip-from-an-rhce-memory-storage-on-postgresql/ – Raman Sinclair Jul 15 '21 at 21:07
56

This is not possible with Postgres. It does not offer an in-process/in-memory engine like HSQLDB or MySQL.

If you want to create a self-contained environment you can put the Postgres binaries into SVN (but it's more than just a single executable).

You will need to run initdb to setup your test database before you can do anything with this. This can be done from a batch file or by using Runtime.exec(). But note that initdb is not something that is fast. You will definitely not want to run that for each test. You might get away running this before your test-suite though.

However while this can be done, I'd recommend to have a dedicated Postgres installation where you simply recreate your test database before running your tests.

You can re-create the test-database by using a template database which makes creating it quite fast (a lot faster than running initdb for each test run)

  • 9
    It looks like the second answer by Erwin below should be marked as the right answer – vfclists Apr 09 '12 at 11:40
  • 3
    @vfclists Actually, a tablespace on a ramdisk is a really bad idea. Don't do that. See http://www.postgresql.org/docs/devel/static/manage-ag-tablespaces.html, http://stackoverflow.com/q/9407442/398670 – Craig Ringer Jun 16 '14 at 00:54
  • 1
    @CraigRinger: To clarify for this particular question: It's a bad idea to mix with valuable data (and thanks for the warning). For unit-testing with a dedicated DB cluster, a ramdisk is fine. – Erwin Brandstetter Jul 12 '15 at 00:32
  • 2
    With docker-use being commonplace, some people have been successful with a tool like `testcontainers`, which essentially lets your test startup a throwaway, dockerized, postgres-instance. See https://github.com/testcontainers/testcontainers-java/blob/master/modules/jdbc-test/src/test/java/org/testcontainers/junit/SimplePostgreSQLTest.java – Hans Westerbeek Jan 11 '18 at 13:50
  • SCROLL DOWN to Rubms answer, there is now an embedded postgres library – ekcrisp Jan 03 '20 at 17:08
  • 1
    @ekcrisp. that's not a true embedded version of Postgres. It's just a wrapper library to make starting a Postgres instance (in a separate process) easier. Postgres will still run "outside" of the Java application and not "embedded" in the same process that runs the JVM –  Jan 03 '20 at 17:35
  • @a_horse_with_no_name thanks for clarifying. Still good to call out as it may be useful for anyone looking into this – ekcrisp Jan 03 '20 at 19:44
39

Now it is possible to run an in-memory instance of PostgreSQL in your JUnit tests via the Embedded PostgreSQL Component from OpenTable: https://github.com/opentable/otj-pg-embedded.

By adding the dependency to the otj-pg-embedded library (https://mvnrepository.com/artifact/com.opentable.components/otj-pg-embedded) you can start and stop your own instance of PostgreSQL in your @Before and @Afer hooks:

EmbeddedPostgres pg = EmbeddedPostgres.start();

They even offer a JUnit rule to automatically have JUnit starting and stopping your PostgreSQL database server for you:

@Rule
public SingleInstancePostgresRule pg = EmbeddedPostgresRules.singleInstance();
Rubms
  • 725
  • 7
  • 16
  • 1
    How is your experience with this package six months later? Works well, or riddled with bugs? – oligofren May 15 '18 at 14:53
  • @Rubms Did you migrate to JUnit5? How do you use the replacement of the `@Rule` with `@ExtendWith`? Just use the `.start()` in `@BeforeAll`? – Frankie Drake Mar 11 '19 at 10:05
  • I have not migrated to JUnit5, so I cannot yet answer your question. Sorry. – Rubms Mar 11 '19 at 11:10
  • This worked well. Thanks. Use following to create datasource in your spring config if you like: `DataSource embeddedPostgresDS = EmbeddedPostgres.builder().start().getPostgresDatabase();` – Sacky San Oct 25 '19 at 04:05
  • 1
    This isn't an in-memory postgres embedded in an application like the original question asked. It's a test automation tool that uses ephemeral Docker containers with some tooling around it. Very nice, but not embedded postgres. – Craig Ringer Oct 27 '22 at 04:21
20

You could use TestContainers to spin up a PosgreSQL docker container for tests: http://testcontainers.viewdocs.io/testcontainers-java/usage/database_containers/

TestContainers provide a JUnit @Rule/@ClassRule: this mode starts a database inside a container before your tests and tears it down afterwards.

Example:

public class SimplePostgreSQLTest {

    @Rule
    public PostgreSQLContainer postgres = new PostgreSQLContainer();

    @Test
    public void testSimple() throws SQLException {
        HikariConfig hikariConfig = new HikariConfig();
        hikariConfig.setJdbcUrl(postgres.getJdbcUrl());
        hikariConfig.setUsername(postgres.getUsername());
        hikariConfig.setPassword(postgres.getPassword());

        HikariDataSource ds = new HikariDataSource(hikariConfig);
        Statement statement = ds.getConnection().createStatement();
        statement.execute("SELECT 1");
        ResultSet resultSet = statement.getResultSet();

        resultSet.next();
        int resultSetInt = resultSet.getInt(1);
        assertEquals("A basic SELECT query succeeds", 1, resultSetInt);
    }
}
Andrejs
  • 26,885
  • 12
  • 107
  • 96
11

If you are using NodeJS, you can use pg-mem (disclaimer: I'm the author) to emulate the most common features of a postgres db.

You will have a full in-memory, isolated, platform-agnostic database replicating PG behaviour (it even runs in browsers).

I wrote an article to show how to use it for your unit tests here.

Olivier
  • 5,578
  • 2
  • 31
  • 46
  • that looks awesome! some useful tool I'm looking for. I missing CURRENT_TIMESTAMP, SUM(), enum support but the rest looks fine – hanskoff Sep 17 '20 at 15:46
  • @RodrigoManguinho What do you mean ? Could you open an issue giving more context ? like how you encountered the error, which version of pg-mem and Typeorm are installed, ... (it works on my machine) – Olivier Jan 29 '21 at 15:23
  • Hi Oliver. The only way I made it work was running a script manually to create my table. If I use the config option to synchronize it does not work. Tried to use synchronize on ormconfig options and with the connection instance. Both cases gives me error. – Rodrigo Manguinho Feb 04 '21 at 16:09
  • @Olivier just to give you more details. If I run connection.synchronize() I receive this error: QueryFailedError: column "columns.table_name" does not exist But if I run connection.query('create table ...') it works. The table is very simple with just two fields: id and name – Rodrigo Manguinho Feb 04 '21 at 16:19
  • @RodrigoManguinho I think some info is missing... connection.synchronize() is precisely supposed to create those tables. If exception comes from typeorm, you should have a more detailed exception (you recognize pg-mem exceptions because they have emojis in them) ... othersise, I think it might not be related to pg-mem. Post an issue in github if you have more details (its pretty hard to read here) – Olivier Feb 04 '21 at 17:00
  • @Olivier I created a very simple project with the error. Do you mind taking a look? https://github.com/rmanguinho/pg-mem-test – Rodrigo Manguinho Feb 04 '21 at 17:02
  • 1
    @RodrigoManguinho Okay, that's a problem arising with typeorm@0.2.30 (I only tested typeorm@0.2.29) ... I created an issue for that https://github.com/oguimbal/pg-mem/issues/53 – Olivier Feb 05 '21 at 20:01
9

There is now an in-memory version of PostgreSQL from Russian Search company named Yandex: https://github.com/yandex-qatools/postgresql-embedded

It's based on Flapdoodle OSS's embed process.

Example of using (from github page):

// starting Postgres
final EmbeddedPostgres postgres = new EmbeddedPostgres(V9_6);
// predefined data directory
// final EmbeddedPostgres postgres = new EmbeddedPostgres(V9_6, "/path/to/predefined/data/directory");
final String url = postgres.start("localhost", 5432, "dbName", "userName", "password");

// connecting to a running Postgres and feeding up the database
final Connection conn = DriverManager.getConnection(url);
conn.createStatement().execute("CREATE TABLE films (code char(5));");

I'm using it some time. It works well.

UPDATED: this project is not being actively maintained anymore

Please be adviced that the main maintainer of this project has successfuly 
migrated to the use of Test Containers project. This is the best possible 
alternative nowadays.
akvyalkov
  • 273
  • 5
  • 17
  • 1
    That must explode in all sorts of new and exciting ways if you use multiple threads, embed a JVM or Mono runtime, fork() your own child processes, or anything like that. *Edit*: It's not really embedded, it's just a wrapper. – Craig Ringer Jan 05 '20 at 10:18
7

If you can use docker you can mount postgresql data directory in memory for testing

docker run --tmpfs=/data -e PGDATA=/data postgres
Hichem MAALMI
  • 71
  • 1
  • 1
3

You can also use PostgreSQL configuration settings (such as those detailed in the question and accepted answer here) to achieve performance without necessarily resorting to an in-memory database.

Community
  • 1
  • 1
Dan
  • 7,155
  • 2
  • 29
  • 54
  • The OP's main issue is spinning up a Postgres instance in-memory, not for performance, but for simplicity in bootstrapping unit tests in a dev and CI environment. – triple.vee Jun 11 '20 at 08:00
0

If have full control over your environment, you arguably want to run postgreSQL on zfs.

Jona Engel
  • 369
  • 1
  • 13