So. You already mentioned available solutions.
TL;DR; I prefer first option.
Cluster
In general, recommended layout is 3 nodes (2 slaves + 1 master).
But your layout - 2 nodes (1 master + 1 slave + 1 arbiter) is viable too. Especially if one server can handle your workload.
Good things:
- Almost "real-time" replica.
- Possibility to utilise resources to handle bigger workload.
- Better availability.
Notes:
- If you have 10mb/sec write load on master, then same load will be applied on slave node. This shouldn't affect reads from slave at all (except write load is REALLY huge).
- Maintenance costs are bigger, then single-instance installation. You should plan how to handle cluster upgrades, configuration updates, plugin updates.
- Branched data. In clustered environment there is possibility to end up in "split-brain" scenario, when 2 nodes have different data and decision should be made which data should be kept. Neo4j handles such cases quite good. But you should keep in mind that small data-loss can occur in VERY RARE scenarios.
Backup
Good things:
- Simple. Just do backups from database.
- Consistency check. When backup is made, tool runs consistency check to verify if database is not damaged. There is no possibility that Backup will screw up live database. If there any issues - you will be notified via logs from backup utility. See below detailed info on to how backup is performed.
- Database. Neo4j backup is fully-functional database. You can spin-up server that points to backup database, and do everything you wan't.
- Incremental backups. You can do incremental backups as often, as you wan't.
Notes:
- Neo4j scales vertically very well (depends on size of database). It can handle huge load on single instance (we had up to 3k requests/second on medium machine). So, you can get one bigger machine for Neo4j server and other smaller (cheaper) for backups.
How backup is performed?
One thing that should be kept in mind - live database is still fully operational. Backup utility doesn't not stop or prevent any actions.
When transaction in database is committed, all changes are appended to transaction log.
- When there are no previous backup present: copy whole storage.
- When there is previous backup AND transaction logs are available: copy new transaction logs and replay them on to storage.
- When there is previous backup AND transactions are NOT available: discard existing storage, copy existing storage.
Why transaction logs can not be available? Your configuration may say to keep only latest transaction logs (i.e. 1 hour), or not to keep at all.
Relevant settings:
Other
Anyway, you should consider making backups event in clustered environment. Everything can fail, in any moment.
In general - everything depends on your load and database size.
If your database is small enough to fully fit in memory and one machine is enough to handle all load, then one Neo4j instance will be enough. Just do backup.
If you wan't better scalability/availability and real-time working replica, then cluster setup is best choice.