2

I am currently working on a project where I need to do databases synchronization. We have a main database on a server and a webapp on it to interact with the data. But since this data is geographic (complex polygons and some points), it is more convenient and more efficient for the users to have a local database when working on the polygons (we use QGIS), and then upload the changes in the server. But while an user was working locally, it is possible that some points were modified in the server (it is only possible to interact with the points on the server). This is why I need the ability to synchronize the databases.

Having an history of INSERT, UPDATE and DELETE of the points on the local database and the same on the server database should be enough to reconstruct a history of the points and then synchronize.

By the way, we use Spatialite for local databases and PostGIS for the server main database

I found a bunch of resources on how to do this using triggers on databases:

http://database-programmer.blogspot.com/2008/07/history-tables.html

How to Store Historical Data

...

But I could not find any tool or library for doing this without having to manually write the triggers. For my needs I could absolutely do it manually, but I feel like it is also something that could be made easier and more convenient with a dedicated command-line/API tool. The tool would for instance generate history tables and triggers for the tables where the user want to track an history, and we could also imagine different options such as:

  • Which columns do we want to track?
  • Do we only want to track the actions, or also the values?
  • ...

So, to conclude, my questions are:

  • Is there any existing tool doing this? I searched and found nothing.
  • Do you think it would be feasible/relevant to implement a such tool? I was thinking in doing it in Python (since my project is Django-powered), enable different backends (right now I need SQLite/Spatialite and PostgreSQL/PostGIS)...

Thank's for your answers,

Dim'

Community
  • 1
  • 1
Dim'
  • 518
  • 6
  • 12
  • 1
    Well, you're making quite a large assertion here. "Having a history (...) on the local database and the same on the server (...) should be enough." I don't think this is true in general. You're going to have to think very carefully about what happens when (not if, when) two users change the same data while both are offline. Take a look at how other distributed systems (like [CouchDB](https://couchdb.apache.org/) for instance) cope with that kind of problem. – Wander Nauta Jul 19 '15 at 22:59

4 Answers4

1

Chek out GeoGig. GeoGig can track and synchronize geodata from various sources, i.e Postgis, Esri shapefile and spatialite. It implements the typical Git workflow but on data. You will have a data repository on a server which can be cloned and pulled and pushed from your local workstation.

GeoGit is a young project, still in beta but already powerful and features rich, having the ability to merge different commits, create diffs, switch branches, track history and all other typical Git tasks.

A example of a tipical GeoGig workflow:

Geogig has a comfortable command line interface:

# on  http://server, initialize and start the remote repository on port 8182 (defaut)
geogig init
geogig serve

# on local, clone the remore repository to your machine
geogig clone http://server:8182 your_repository

cd your_repository/

# on local, import in geogig the data you are working on (Postgis)
geogig pg import --schema public --database your_database --user your_user --password your_pass --table  your_table

# on local, add the local changes
geogig add

# on local, commit your changes
geogig commit -m "First commit"

# on local, push to the remote repository
geogig push
Tom-db
  • 6,528
  • 3
  • 30
  • 44
  • Thank you! Geogig seems like a great candidate to solve the problem! It gives history + synchronization. I'm already using Git a lot, so it won't be painful to ge into Geogig. Now, I "just" have to think of a nice workflow to ensure the integrity of the data when updates are made by users, and to hide as most as possible the Geogig-ish technical aspects to the users, letting them work locally with files as they are used to do. I am very impressed by the capabilities of Geogig! I hope it will be pursued, the last commit on their Github repository was several months ago... – Dim' Jul 22 '15 at 03:42
1

You could ask bucardo to do the heavy lifting in terms of multi-master-synchronization. Have a look at https://bucardo.org/wiki/Bucardo They promise they can even synchronize between different types of databases, eg. postgresql <-> sqlite, http://blog.endpoint.com/2015/08/bucardo-postgres-replication-pgbench.html

I'm not sure about special geospatial capabilities though (synchronizing only regions).

Geogig is definitively worth a try. You can plug a Geogig Repository directly into GeoServer to serve WMS and to edit Features via Web/WFS.

Sebastian
  • 11
  • 1
  • Thank you for answering Sebastian, but I already solved my problem using GeoGig. This is a very promising project! I'll also have a look on Bucardo, it looks interesting too. – Dim' Sep 09 '15 at 23:10
0

As Wander hinted at, this is not as simple as "Having an history of INSERT, UPDATE and DELETE" and keeping them syncronized. There's lots going on under the hood. There are plenty of DBMS tools for replication / mirroring. Here is one example for PostreSQL: pgpool.

David G
  • 318
  • 2
  • 8
0

Thank for the answers Wander Nauta and David G, I totally agree on the fact that it is not as simple as this to perform synchronization in general. I should have given more details, but it my case I was believing it could be enough because:

  • The local data is always a subset of the server data, and each user is assigned a subset. So there is always only one person working offline on a given subset.
  • On the server, the users can only modify/delete the data they created.

To give more informations on the context, each user is locally digitizing an area in a district, from aerial images. Each user is assigned a district to digitize, and is able to upload his work on the server. On the server, through a webapp, the users can consult the work of everyone, post problem points and comment them, mainly to point out a doubt or an omission on the digitizing. What I want is the users to be able to download a copy of the district they're working on with the points added by their collegues, solve the problems locally, delete the points, eventually add new doubts and upload again.

There is not really a master/slave relation between a local database and the server one, each one has a specific role. Because of this, I am not sure that replication/mirroring will feed my needs, but maybe I'm wrong? Also, I'd like to avoing going with a too sophisticated solution that outfits the needs, and avoid adding too many new dependendies, because the needs won't evolve much.

Dim'
  • 518
  • 6
  • 12