13

The Setup

I have native iOS and Android apps which sync data to and from my webserver. A requirement of the apps is that they work offline so data is stored on the apps in sqlite databases.

The apps communicate with the server with a series of REST calls which send JSON from the server for the apps to store in their databases.

My Problem

The scale of this data is very large, some tables can have a million records, and the final size of the phone databases can approach 100mb.

The REST endpoints must limit their data and have to be called many times with different offsets for a whole sync to be achieved.

So I'm looking for ways to improve the efficiency of this process.

My Idea

An idea I had was to create a script which would run on the server which would create an sqlite file from the servers database, compress it and put it somewhere for the apps to download. Effectively creating a snapshot of the server's current data.

The apps would download this snapshot but still have to call their REST methods in case something had changed since the snapshot happened.

The Question

This would add another level of complexity to my webapp and I'm wondering if this is the right approach. Are there other techniques that people use when syncing large amounts of data?

Iulian Onofrei
  • 9,188
  • 10
  • 67
  • 113
John Webb
  • 381
  • 3
  • 14
  • The script idea is not too bad, but I'm wondering with what kind of business model are you dealing with? What type of information are you showing that the App users need to have the information online all the time? Could you add a little bit more of information about that? Maybe you can get rid of some columns that are useless in the App client. – 4gus71n Jul 06 '18 at 13:57
  • Your idea sounds pretty good to me. Google Translate does that for their offline translation and Google Maps does same for offline custom maps. Http protocol implements caching so you can always check whether the file was updated or not. – Gennadii Saprykin Jul 06 '18 at 14:07
  • So the data is basically a large address book which have coordinates. The users then visit the houses which are closest to them. It's possible that the addresses are in very rural areas and a user wouldn't have to sync on wifi before they left. I think it is simpler for users to just sync all the data rather than the user have to say which segment they wanted to download. – John Webb Jul 06 '18 at 14:19
  • Whether to sync the whole thing on a device or not depends on what kind of experience you want to give to the user. Downloading everything will make your app easier to use but I'm pretty sure there are many users who wouldn't want your app to take that much space on their phone. It's usually a good idea to ask the user first, and download such a big chunk of data only after their agreement. Otherwise I'd go with online vs partial caching option. – Gennadii Saprykin Jul 06 '18 at 14:23

2 Answers2

2

This is a complex question, as the answer should depend on your constraints:

  1. How often will data change? If it is too often, then the snapshot will get out of date really fast, thus apps will be effectively updating data a lot. Also, with the big volume of data, an application will waste CPU time on synchronization (even if user is not actively using all of that data!), or may become quickly out of sync with the server - this is especially true for iOS where Applications have very limited background capabilities (only small window, which is throttled) compared to Android apps.

  2. Is that DB read-only? Are you sending updates to the server? If so, then you need to prepare conflict resolution techniques and cover cases, in which data is modified, but not immediately posted to the server.

  3. You need to support cases when DB scheme changes. Effectively in your approach, you need to have multiple (initial) databases ready for different versions of your application.

Your idea is good in case there are not too many updates done to the database and regular means of download are not efficient (which is what you generally described: sending millions of records through multiple REST calls is quite a pain).

But, beware of hitting a wall: in case data changes a lot, and you are forced to update tens/hundreds of thousands of records every day, on every device, then you probably need to consider a completely different approach: one that may require your application to support only partial offline mode (for most recent/important items) or hybrid approach to data model (so live requests performed for most recent data in case user wants to edit something).

deekay
  • 899
  • 6
  • 8
  • The largest of the tables would only have significant changes once a week. With many tiny changes happening daily but this would maybe effect less than 1000 rows a week. Large changes happen when a new excel is imported or deleted. I was thinking of using these moments to trigger the snapshot creation script. – John Webb Jul 06 '18 at 15:11
  • 1000 changes/week is completely manageable through REST API. Question is: once huge diff (excel is imported) happens, are you planning to re-fetch whole database from scratch on every device? If so, take into consideration how often that happens and how big database file will be (all the implications: background download, unreliable connection, possible download on cellular network and its impact on the user) and how to handle that from UX perspective. – deekay Jul 06 '18 at 17:39
0

100mb is not so big. My apps have been synching many GBs at this point. If your data can be statically generated and upated , then one thing you can do is write everything to the server, (json, images, etc...) and then sync all on your local filesystem. In my case I use S3. At a select time or when the user wants to, they sync and it only pulls/updates what's changed. AWS actually has an API call called sync on a local/remote folder or bucket. A single call. I do mine custom, but essentially it's the same, check the last update date and file size locally and if it's different, you add that to the download queue.

Iulian Onofrei
  • 9,188
  • 10
  • 67
  • 113
yeahdixon
  • 6,647
  • 1
  • 41
  • 43