2

If I have a deep nested table (5 tier) relationship, what is a good method in retrieving the correlating data so that I can store them in my Java objects?

Here's what the tables look like:

 ___________   ___________   ___________   ___________   ___________
|  Table A  | |  Table B  | |  Table C  | |  Table D  | |  Table E  |
|___________| |___________| |___________| |___________| |___________|
| tableA_id | | tableB_id | | tableC_id | | tableD_id | | tableE_id |
| some_data | | some_data | | some_data | | some_data | | some_data |
|___________| | tableA_id | | tableB_id | | tableC_id | | tableD_id |
              |___________| |___________| |___________| |___________|

METHOD 1 is what I'm currently using

  1. A query to retrieve a row from the base table. I store this in ObjectA.
  2. A query to retrieve all rows from the second table matching the id of the previous query. I store them in List<ObjectB> inside ObjectA.
  3. For each ObjectB, a query to retrieve all rows of the third table matching the id of ObjectB. I store these in List<ObjectC> inside ObjectB.
  4. For each ObjectC, a query to retrieve all rows of the fourth table matching the id of ObjectC. I store these in List<ObjectD>inside ObjectC.
  5. For each ObjectD, a query to retrieve all rows of the fifth table matching the id of ObjectD. I store these in List<ObjectE>inside ObjectD.
  6. All of the above queries sit within a "db.beginTransaction" and a "db.endTransaction" statement so that if one query fails unnaturally (errors), ObjectA will not be created.

The method above has a lot of looping, and a lot of queries. I get the feeling that this isn't the way to go because I'm constantly sending and receiving data from the database at every iteration of my loops.

METHOD 2

Another method I was thinking of is to join the 5 tables together. This leads to just one query but the looping remains, because I will need to group and loop through the foreign id of each tier. At least this way, I get all the information up front. But I don't know how to dissect the data into their respective object once I have them in one lump. Here's what my join looks like:

SELECT * FROM (((tableA
JOIN tableB ON tableA._id = tableB.tableA_id)
JOIN tableC ON tableB._id = tableC.tableB_id)
JOIN tableD ON tableC._id = tableD.tableC_id)
JOIN tableE ON tableD._id = tableE.tableD_id

Now how would I go through the cursor in order to dissect the join and put their respective data into the right object?

chaser
  • 3,107
  • 4
  • 29
  • 34
  • 1
    Using joins to retrieve your database objects will lead to duplicate unnecessary data (let's say you have 100 `B` rows per `A` row). On the other hand, the 1st method causes [N+1 SELECT problem](http://stackoverflow.com/questions/97197/what-is-the-n1-selects-issue). If you don't expect any performance issue, I'd stay with 1st method. But it all depends on your dataset, its structure etc. – Luke Jul 05 '16 at 11:28
  • You have encountered the [object-relational impedance mismatch](https://en.wikipedia.org/wiki/Object-relational_impedance_mismatch). To avoid this, don't load entire objects, but do your operations on the data in the DB. – CL. Jul 05 '16 at 12:12
  • @qbeck, so it seems that both methods are bad either way. – chaser Jul 05 '16 at 12:41
  • @CL, there is no mismatch. There is just the need to store the the closely related data to an online database. If you're hinting that I should segregate the operations into smaller tasks such that I query and write to the database per user interaction, I've already thought of that. But that will not do because ObjectA must be editable without access to the database. User will only upload ObjectA to the database when they "commit" to it. When the the row of TableA is queried and all its child tables are queried, the row will be locked from editing. – chaser Jul 05 '16 at 12:53
  • If your dataset is totally unpredictable, both methods have drawbacks. At very first ask yourself _Do I really need to retrieve my whole database at once?_ Perhaps you can separate some queries to another parts of your application? If you have hundreds of rows, maybe it's sufficient to get only `TOP 100` per operations? You can also denormalize your database to improve performance at the expense of memory. Consider creating views and different instances of objects in order to achieve satisfying results. I know these are all just tips but the solution depends on the whole scope of your project. – Luke Jul 06 '16 at 08:48
  • Thank you for all the tips. Although i haven't come up with a solution yet I'm thinking to give user 2 options. The checkin and out capability to work without access to server. And the on demand request. This way user is in control of performance. Also maybe build a webservice to handle the request on server side to alleviate network traffic. Will post a solution once i have a working model. – chaser Jul 06 '16 at 10:14
  • I'm not sure what's your root problem because now you are speaking about concurrency in accessing database (not mapping your data into app). Maybe it's easier for you to put the app logic into stored procedures? – Luke Jul 06 '16 at 15:09

1 Answers1

0

After some thoughts the solution is two fold:

Pass the service performance decision over to the user:

To avoid running into constant performance problem, provide user with the ability to "checkout" objectA from the server. User will then be able to veiw/modify objectA while offline. Obviously the upfront performance cost is huge, but because it is not the default behavior, it isn't a concern.

To avoid the N+1 Selection problem and avoid unnecessary data transmission over the network:

Every nested table will store the root table reference key like this:

 ___________   ___________   ___________   ___________   ___________
|  Table A  | |  Table B  | |  Table C  | |  Table D  | |  Table E  |
|___________| |___________| |___________| |___________| |___________|
| tableA_id | | tableB_id | | tableC_id | | tableD_id | | tableE_id |
| some_data | | some_data | | some_data | | some_data | | some_data |
|           | | tableA_id | | tableB_id | | tableC_id | | tableD_id |
|___________| | tableA_id | | tableA_id | | tableA_id | | tableA_id |
              |___________| |___________| |___________| |___________|

When data of ObjectA is required to be checked out, a query is made in reverse order like this:

1. SELECT * FROM tableE WHERE tableA_id=?
2. Store the result in MAP<long, List>
3. Each unique entry of TablD_id are stored in a list, mapped by the unique id.

Repeat the above process for each table, the last being tableB. In this case, there are 5 total queries instead of N+1.

The app will then traverse the MAP to restore objectA in its entirety.

chaser
  • 3,107
  • 4
  • 29
  • 34