I have a ready enterprise (non-AppStore) legacy iOS application for iPad that I need to refactor (it was written by another developer, my predecessor on my current job).
This application fetches its data via JSON from a server having MSSQL database. The database schema has about 30 tables, the most capacious are: Client, City, Agency each having about 10.000 records each and the further growth is expected in the future. After the JSON is received (one JSON request-and-response pair for each table) - it is mapped to the CoreData - the process which also includes glueing together the corresponding CoreData entities (Client, City, Agency and others) with each other i.e. setting the relations beetween these entities on the CoreData layer.
In itself the project's CoreData fetch-part (or read-part) is heavily optimized - it uses, I guess, almost all possible performance and memory tweaks CoreData has, that is why UI layer of application is very fast and responsive, so that I consider its work as completely satisfactory and adequate.
The problem is the process of preparation of CoreData layer, i.e. the server-to-client synchronization process: it takes too much time. Consider 30 network requests resulting in 30 JSON packs ("pack" I mean "one table - one JSON"), which are then mapped to 30 CoreData entities, which are then glued together (the appropriate CoreData relations are set beetween them). When I first saw how all this is done in this project (too slow), the first idea to come into my head was:
"For the first time a complete synchronization is performed (app's first launch time) - perform a fetch of the whole database data in, say, one archived file (something like database dump) and then somehow import it as a whole to a Core Data land".
But then I realized that, even if such transmission of such one-file dump was possible, CoreData would still require me to perform a gluing of the corresponding CoreData entities to set the appropriate relations beetween them so that it is hard to imagine that I could benefit in performance if I would rely on this scheme.
Also, my colleague suggested me to consider SQLite as a complete alternative to Core Data, but unfortunately I don't have an experience of using it, that's why I am completely blind to foresee all the consequences of such serious design decision (even having the synchronization process very slow, my app does work, especially its UI performance is very good now). The only thing I can imagine about SQLite that in contrast to Core Data it will not push me to glue some additional relations on a client side, because SQLite has its good old foreign key system, doesn't it?
And so here are the questions (Respondents, please do not mix these points when you answer - there is too much confusion I have about all of them):
Does anybody have such experience of taking "first-time large import of the whole database" approach in a way I have described above? I would be very thankful to know about any solutions should they exploit JSON<->CoreData pair or not.
Does Core Data has some global import mechanism which can allow mass-creation of corresponding 30-tables-schema (possibly using some specific source other than "30 packs of JSON" described above) without a need of setting up corresponding relations for 30 entities?
Are there any possibilities to speed up the synchronization process if 2) is impossible? Here I mean the improvements of current JSON<->CoreData scheme my app uses.
Migration to SQLite: should I consider such migration? What I will benefit from it? How the whole process of replication->transmission->client preparations could look like then?
Other alternatives to CoreData and SQLite - what could they be or look like?
Any other thoughts or visions you may have about the situation I've described?
UPDATE 1
Though the answer written by Mundi is good (one large JSON, "No" for using SQLite), I am still interested if there are any other insights into the the problem I've described.
UPDATE 2
I did try to use my russian english the best way I could to describe my situation in a hope for my question could become pretty clear to everyone who will read it. By this second update I will try to provide it with some more guides to make my question even more clear.
Please, consider two dichotomies:
- What can/should I use as a data layer on iOS client - CoreData vs SQLite?
- What can/should I use as a transport layer - JSON (single-JSON-at-once as suggested in the answer, even zipped maybe) or some DB-itself-dumps (if it is even possible, of course - notice I am also asking this in my question).
I think it is pretty obvious the "sector" which is formed by intersection of these two dichotomies, choosing CoreData from the first one and JSON from the second is the most wide-spread default in iOS development world and also it is used by my app from this question.
Having that said, I claim that I would be thankful to see any answers regarding CoreData-JSON pair as well as the answers considering using any other "sectors" (what about opting SQLite and some kind of its dumps approach, why not?)
Also, important to note, that I don't want to just drop the current option for some other alternatives, I just want to get the solution working fast on both synchronization and UI phases of its usage. So answers about improving current scheme as well as answers suggesting the other schemes are welcome!
Now, please see the following update #3 which provides more details for my current CoreData-JSON situation:
UPDATE 3
As I have said, currently my app receives 30 packs of JSON - one pack for the whole table. Let's take capacious tables for example: Client, Agency, City.
It is Core Data, so if a client
record has non-empty agency_id
field, I need to create new Core Data entity of class Agency (NSManagedObject subclass)
and fill it with this record's JSON data, that's why I need to already have corresponding Core Data entity for this agency of class Agency (NSManagedObject's subclass)
, and finally I need to do something like client.agency = agency;
and then call [currentManagedObjectContext save:&error]
. Having it done this way, later I can then ask this client to be fetched and ask its .agency
property to find corresponding entity. I hope I am completely sane when I do it this way.
Now imagine this pattern applied to the following situation:
I have just received the following 3 separate JSON packs: 10000 clients and 4000 cities and 6000 agencies (client has one city, city has many clients; client has agency, agency has many clients, agency has one city, city has many agencies).
Now I want to setup the following relations on Core Data level: I want my client entity client
to be connected to a corresponding city and corresponding agency.
The current implementation of this in the project does very ugly thing:
Since dependency order is the following: City -> Agency -> Client i.e. the City needs to be baked first, the application begins creating entities for City and persists them to Core Data.
Then it deals with the JSON of agencies: it iterates through every JSON record - for every agency, it creates a new entity
agency
and by itscity_id
, it fetches corresponding entitycity
and connects it using theagency.city = city
. After the iteration through the whole agencies JSON array is done, current managed object context is saved (actually the -[managedObjectContext save:] is done several times, each after 500 records processed). At this step it is obvious that fetching one of 4000 cities for every client for every of 6000 agencies has a huge performance impact on the whole synchronization process.Then, finally it deals with the JSON of clients: like in previous 2 stage, it iterates through the whole 10000-elements JSON array and one by one performs the fetch of corresponding agencies and ZOMG cities, and this impacts the overall performance in the same manner like previous stage 2 does.
It is all very BAD.
The only performance optimization I can see here, is that the first stage could leave a large dictionary with cities ids (I mean NSNumber's of real ids) and faulted City entities as values) so it would be possible to prevent ugly find process of the following stage 2 and then do the same on the stage 3 using the analogous caching trick, but the problem is that there are much more relations beetween all the 30 tables that just-described [ Client-City, Client-Agency, Agency-City ] so the final procedure involving a caching of all the entities will the most probably hit the resources iPad device reserves for my app.
UPDATE 4
Message for future respondents: I've tried my best to make this answer well-detailed and well-formed and I really expect you to answer with verbose answers. It would be great if your answer would really address the complexity of problem discussed here and complement my efforts I've made to make my question clear and general as much as possible. Thanks.
UPDATE 5
Related topics: Core Data on client (iOS) to cache data from a server Strategy, Trying to make a POST request with RestKit and map the response to Core Data.
UPDATE 6
Even after it is no more possible to open new bounties and there is accepted answer, I still would be glad to see any other answers containing additional information about the problem this topic addresses. Thanks in advance.