0

I want to build an application that uses data from several endpoints.

Lets say I have:

  • JSON API for getting cinema data
  • XML Export for getting data about ???
  • Another JSON API for something else
  • A csv-file for some more shit ...

In my application I want to bring all this data together and build views for it and so on ...

MY idea was to set up a database by create schemas for all these data sources, so I can do some kind of "import scripts" which I can call whenever I want to get the latest data.

I thought of schemas because I want to be able to easily adept a new API with any kind of schema.

Please enlighten me of the possibilities and best practices out there (theory and practice if possible :P)

SVARTBERG
  • 435
  • 1
  • 7
  • 16

2 Answers2

0

You are totally right on making a database. But the real problem is probably not going to be how to store your data. It's going to be how to make it fit together logically and semantically.

I suggest you first take a good look at what your enpoints can provide. Get several samples from every source and analyze them if you can. How will you know which data is new? How can you match it against existing data and against data from other sources? If existing data changes or gets deleted, how will you detect and handle that? What if sources disagree on something? How and when should you run the synchronization? What will you do if one of your sources goes down? Etc.

It is extremely difficult to make data consistent if your data sources are not. As a rule, if the sources are different, they are not consistent. Thus the proverb "garbage in, garbage out". We, humans, have no problem dealing with small inconsistencies, but algorithms cannot work correctly if there are discrepancies. Even if everything fits together on paper, one usually forgets that data can change over time...

At least that's my experience in such cases.

jurez
  • 4,436
  • 2
  • 12
  • 20
  • If i always want latest data, can't i just empty the tables and read the new data from the data source? – SVARTBERG Oct 26 '17 at 19:59
  • You didn't provide much defails about your project, but if you can afford to clear and reimport the data every time, it'is actually the simplest approach. If your tables are not connected, you won't have any discrepancies either, but in that case, I'll be worried how useful will your application actually be :-) – jurez Oct 26 '17 at 20:05
  • And what about querying all current data, get all the new data and create a diff from it and then patch it? (i used jsondiffpatch once) – SVARTBERG Oct 26 '17 at 20:26
  • That could work, too. But diffing large JSON objects could be time and memory consuming, so it's not guaranteed to be faster than dropping/reimporting. Simpler it's certainly not. Again, I don't know your details. – jurez Oct 27 '17 at 07:16
  • There are no further details yet. I basically want it to be a platform that should put things together that are actually different. For example, think of a city app. You get data for events from X, data for restaurants, etc. from Y and data for local garbage collection from Z. You would want citizens to get an application that displays all that data in one app. Just an example. – SVARTBERG Oct 27 '17 at 07:41
0

I'm not sure if in the application you want to display all the data in the same view or if you are going to be creating different views for each of the sources. If you want to display the data in the same view, like a grid, I would recommend using inheritance or an interface depending on your data and needs. I would recommend setting this structure up in the database too using different tables for the different sources and having a parent table related to all them that has a type associated with it.

Here's a good thread with discussion about choosing an interface or inheritance. Inheritance vs. interface in C#

And here are some examples of representing inheritance in a database. How can you represent inheritance in a database?

Pat
  • 1
  • 1
  • Thanks. Just to make one thing clear: I think it would be totally ok if every data source had its own database table. I dont think I need to mix them. Lets say I have a database table for cinema screenings and a database table for weather data (just an example, i guess in the case of weather I would not have to keep it in a database). Then in my application I can query the cinema screening data and display them and if someone views a specific screening, I have a date with which I would query the weathertable to see what the weather will be on that date. Rly bad example but you get the point. – SVARTBERG Oct 26 '17 at 19:58