0

Scenario:

I am working with a what I think is a fairly large SQLite database (around 20 MB) in my Android app, which consists of around 50 tables.

Most of these tables are linked by foreign keys, and a lot of the time, I need to retrieve information from two or more tables at a time. To illustrate an example:

Table1:

Id  |  Name  |  Attribute1  |  Attribute2  |  ForeignKey

1   |  "Me"  |  SomeValue   |  AnotherVal  |     49
2   |  "A"   |     ...      |     ...      |     50
3   |  "B"   |              |              |     49

Table2:

Id  |  Attribute3  |  Attribute4  |  Attribute5

49  |   ThirdVal   |  FourthVal   |   FifthVal
50  |     ...      |     ...      |     ...

Sometimes, there are more than two tables that link together in this way. Almost all of the time, there are more columns than those presented above, and there are usually around 1000 rows.

My aim is to display a few of the attributes from the database as items in a RecyclerView, but I will need to use both tables to retrieve these attributes.


My method:

Currently, I am using the android-sqlite-asset-helper library to copy this database (.db extension) from the assets folder into the app. When I recorded the time for this copying to happen, it completed in 732 ms, which is fine.

However, when I want to retrieve the data from two tables using the foreign key from the first table, it takes far too long. It took around 11.47 seconds when I tested this, and I want to speed this up.

The way in which I retrieve the data is that I read each row in the first table, and put it into an object:

public static ArrayList<FirstItem> retrieveFirstItemList(Context context) {
    Cursor cursor = new DbHelper(context).getReadableDatabase()
            .query(DbHelper.TABLE_NAME, null, null, null, null, null, null);
    ArrayList<FirstItem> arrayList = new ArrayList<>();
    cursor.moveToFirst();
    while (!cursor.isAfterLast()) {
        // I read all the values from each column and put them into variables
        arrayList.add(new FirstItem(id, name, attribute1, attribute2, foreignKey));
        cursor.moveToNext();
    }
    cursor.close();
    return arrayList;
}

The FirstItem object would contain getter methods in addition to another used for getting the SecondItem object from the foreign key:

public SecondItem getSecondItem(Context context) {
    Cursor cursor = new SecondDbHelper(context).getReadableDatabase().query(
            SecondDbHelper.TABLE_NAME,
            null,
            SecondDbHelper.COL_ID + "=?",
            new String[] {String.valueOf(mForeignKey)},
            null, null, null);
    cursor.moveToFirst();
    SecondItem secondItem = new SecondItem(mForeignKey, attribute3, attribute4, attribute5);
    cursor.close();
    return secondItem;
}

When I print values from both tables into the logcat (I have decided not to use any UI for now, to test database performance), I use something like this:

for (FirstItem firstItem : DBUtils.retrieveFirstItemList(this)) {
    Log.d("First item id", firstItem.getId());
    Log.d("Second item attr4", firstItem.getSecondItem(this).getAttribute4());
}

I suspect there is something wrong with this method as it needs to search through Table2 for each row in Table1 - I think it's inefficient.


An idea:

I have one other method I am considering using, however I do not know if it is better than my current solution, or if it is the 'proper' way to achieve what I want. What I mean by this is that I am unsure as to whether there is a way I could slightly modify my current solution to significantly increase performance. Nevertheless, here is my idea to improve the speeds of reading data from the database.

When the app loads for the first time, data from various tables of the SQLite database would be read then put into one SQLite database in the app. This process would occur when the app is run for the first time and each time the tables from the database are updated. I am aware that this would result in duplication of data across different rows, but it is the only way I see that would avoid me having to search multiple tables to produce a list of items.

// read values from SQLite database and put them in arrays

ContentValues cv = new ContentValues();

// put values into variables

cv.put(COL_ID, id);
...
db.insert(TABLE_NAME, null, values);

Since this process would also take a long time (as there are multiple rows), I was a little concerned that this would not be the best idea, however I read about transactions in some Stack Overflow answers, which would increase write speeds. In other words, I would use db.beginTransaction();, db.setTransactionSuccessful(); and db.endTransaction(); appropriately to increase the performance when rewriting the data to a new SQLite database.

So the new table would look like this:

Id  |  Name  |  Attribute1  |  Attribute2  |  Attribute3  |  Attribute4  | Attribute5

1   |  "Me"  |  SomeValue   |  AnotherVal  |   ThirdVal   |   FourthVal  |  FifthVal
2   |  "A"   |     ...      |     ...      |     ...      |     ...      |     ...
3   |  "B"   |  SomeValue   |  AnotherVal  |   ThirdVal   |   FourthVal  |  FifthVal

This means that although there would be more columns in the table, I would avoid having to search through multiple tables for each row in the first table, and the data would be more easily accessible too (for filtering and things like that). Most of the 'loading' would be done at the start, and hopefully sped up with methods for transactions.


Overview:

To summarise, I want to speed up reading from an SQLite database with multiple tables, where I have to look through these tables for each row of the first table in order to produce the desired result. This takes a long time, and is inefficient, but I'm not sure if there is a way I can adjust my current method to greatly improve read speeds. I think I should 'load' the data when the app is first run, by reorganising the data from various tables into one table.

So I am asking, which of the two methods is better (mostly concerning performance)? Is there a way I can adjust my current method or is there something I am doing incorrectly? Finally, if there is a better way to do this than the two methods I have already mentioned, what is it and how would I go about implementing it?

Farbod Salamat-Zadeh
  • 19,687
  • 20
  • 75
  • 125
  • For those who have voted to close and/or downvoted this question, I don't see how this is too broad. I am asking for the best way to do achieve what I have outlined above, and this could be either by modifying the first method, using the second, or something different. I understand there may not be a single definite answer as people may have different answers to this, but how else should I ask what the best way is to achieve something? I've added details to my question, and I disagree that "good answers would be too long in this format". – Farbod Salamat-Zadeh Jun 24 '16 at 22:12
  • I think some users may have voted this as "too broad" because you explained your situation only using "table1" and "table2." We are not able to advise whether this is a wise idea or not if we don't know what each table has. You could give us some examples. You could use dummy data as an example, but the entities should be the same or similar category. Whether or not you should combine two tables is a sticky situation. You don't want to have to add columns again and again and so on. – Sometowngeek Jun 27 '16 at 00:09
  • @Sometowngeek I can see your point, but I did also mention that these were there merely as an example and that my actual data contains around 50 tables with far more columns and usually around 1000 rows. I don't think it matters what exactly the data is but it does matter how much of it there is, how it is organised, and what I am doing with it, as my question does not concern the data but how I am dealing with it. However, if it would help, let me know and I will update my answer. Hope I didn't sound too defensive or anything ;) – Farbod Salamat-Zadeh Jun 27 '16 at 00:18
  • On the top of my head, I think the fastest way is to dump data via SQL query into an output file like Excel spreadsheet for example, then create a new table. After that, import the output file data into the new table. Make sure to verify the data is correct before deleting the old tables, though. If two tables are linked together by a FK, this can be done easily. – Sometowngeek Jun 27 '16 at 00:27
  • @Sometowngeek What do you mean by "_import the output file data into the new table_"? How exactly would I do this for optimal performance (this is precisely what my question is asking)? – Farbod Salamat-Zadeh Jun 27 '16 at 03:02
  • @FabrodSalamat-Zadeh, It's just what I am thinking of on the top of my head. I'll try coming up with a suggestion when I get home from work since this will take some thinking :-) – Sometowngeek Jun 27 '16 at 12:50
  • @FabrodSalamat-Zadeh: Sorry I haven't had a chance to look at it last night. After doing some quick searching, I think I may have found something. Would [Catalin's suggestion](http://stackoverflow.com/a/725571/3993154) work? – Sometowngeek Jun 28 '16 at 12:30
  • @Sometowngeek It's a good answer, but for a different question. I'm asking what the best method is for structuring my SQLite databases in my app considering performance speed. As I explained in my question, I have tried various methods already of how I should use databases for my requirements, but I'm not sure whether to use one of the two I have written about, or a completely different way. Although I could merge the tables in my database (as I wrote about in my first/original method), I'm not sure whether or not this is the best option considering performance. – Farbod Salamat-Zadeh Jun 28 '16 at 13:09
  • I'm not able to come up with a better answer. :-\ – Sometowngeek Jun 29 '16 at 13:04
  • possible duplicate of http://stackoverflow.com/questions/28188164/android-sqlite-performance – ugur Jun 30 '16 at 20:41
  • @UğurB No, it is not a duplicate - I am asking a different question. I am asking how to transfer/store data which would be best for performance in my app, the answer you have shown talks about inserting data into SQLite. Furthermore, you should be able to see that I talk about transactions in my question (because I had already read that before asking this question). – Farbod Salamat-Zadeh Jun 30 '16 at 21:00
  • @FarbodSalamat-Zadeh: could by joining table in query will help you?. So instead loading another object for the next data, create a big object that store the whole data, both the main table and the other table. fyi, creating object is expensive in Android. – ישו אוהב אותך Jul 01 '16 at 14:11
  • @isnotmenow Maybe, but I'm going to try _Danail_'s answer below first. – Farbod Salamat-Zadeh Jul 01 '16 at 14:12
  • 1
    Apparently, my comment is already in @danail-alexiev first point. You should follow his advice. – ישו אוהב אותך Jul 01 '16 at 14:18

2 Answers2

2

A couple of things that you should try:

  • Optimise your loading. As far as I understood your current method, it runs into the N + 1 queries problem. You have to execute a query to get the first batch of data, and then another query for every row of the original result set, so you can fetch the related data. It's normal that you get a performance problem with that approach. I don't think it's scalable and I would recommend you move away from it. The easiest way is to use joins instead of multiple queries. This is referred to as eager loading.
  • Introduce appropriate indexes on your tables. If you are performing a lot of joins, you should really think about speeding them up. Indexes are the obvious choice here. Normally, primary key columns are indexed by default, but foreign keys are not. This means that you perform linear searches on the your tables for each join, and this is slow. I would try and introduce indexes on your foreign key columns (and all columns that are used in joins). Try to measure the performance of a join before and after to see if you have made any progress there.
  • Consider using database views. They are quite useful when you have to perform joins often. When creating a view, you get a precompiled query and save quite a bit of time compared to running the join each time. You can try executing the query using joins and against a view and this will show how much time you will save. The downside of this is that it is a bit harder to map your result set to a hierarchy of Java objects, but, at least in my experience, the performance gain is worth.
  • You can try and use some kind of lazy loading. Defer loading the related data unless it is being explicitly requested. This can be hard to implement, and I think that it should be your last resort, but it's an option nevertheless. You may get creative and leverage dynamic proxies or something like this to actually perform the loading logic.

To summarise, being smart with indexes / views should do the trick most of the time. Combine this with eager / lazy loading, and you should be able to get to the point where you are happy with your performance.

EDIT: Info on Indexes, Views and Android Implementation

Indexes and Views are not alternatives to the same problem. They have different characteristics and application.

When you apply an Index to a column, you speed up the search on those column's values. You can think of it as a linear search vs. a tree search comparison. This speeds up join, because the database already knows which rows correspond to the foreign key value in question. They have a beneficial effect on simple select statements as well, not only ones using joins, since they also speed up the execution of where clause criteria. They come with a catch, though. Indexes speed up the queries, but they slow down insert, update and delete operations (since the indexes have to maintained as well).

Views are just precompiled and stored queries, whose result sets you can query just like a normal table. The gain here is that you don't need to compile and validate the query each time.

You should not limit yourself to just one of the two things. They are not mutually exclusive and can give you optimal results when combined.

As far as Android implementation goes, there is not much to do. SQLite supports both indexes and queries out of the box. The only thing you have to do is create them. The easiest way is to modify your database creation script to include CREATE INDEX and CREATE VIEW statements. You can combine the creation of a table with the creation of a index, or you can add it later manually, if you need to update an already existing schema. Just check the SQLite manual for the appropriate syntax.

Danail Alexiev
  • 7,624
  • 3
  • 20
  • 28
  • Thanks for the answer. I've not heard of most of these concepts before, as I have mainly been working with Android and not much with databases yet. I understand the first point. I've not heard of indexes before, and after [some research](http://www.tutorialspoint.com/sqlite/sqlite_indexes.htm), I was unsure what the difference is. I found [a SO answer](http://stackoverflow.com/a/769422/4230345) about the difference but I'm still unsure. I now understand views a bit better after [reading about them](http://stackoverflow.com/a/6015246/4230345). Finally, how would I implement this in Android? – Farbod Salamat-Zadeh Jul 01 '16 at 12:47
  • @FarbodSalamat-Zadeh See my edit with some additional info. It's a bit too long for a comment. – Danail Alexiev Jul 01 '16 at 12:56
  • Thank you for your update, Danail - I have a few more questions. To clarify, should I be using indexes on my relational tables (so that joining is faster)? How would I query a view in Android - normally, I would use `cursor.query(...)` passing the name of the table as one of its parameters (what would the table name be here)? Also, to double-check, does this mean I should continue to use the [android-sqlite-asset-helper](https://github.com/jgilfelt/android-sqlite-asset-helper) library so that I can copy a `.db` file into my app, thus avoiding creating the database from CSV files in the app? – Farbod Salamat-Zadeh Jul 01 '16 at 13:34
  • Yope, indexes on the tables (foreign key columns to speed up joining). Querying a view is the same as querying a table - you just pass the view name as the table name. It works in the same way. You can even specify where clauses and everything. I haven't used the library you are referring to. If you can modify the table creation scripts to include the indexes and views, everything should be ok. I think that means that to continue to use the library, you should modify the .db file before inserting you as an asset. – Danail Alexiev Jul 01 '16 at 14:08
  • Thanks again, Danail - I'll try out this method now. Normally, I would award the bounty after trying out the idea, but your suggestions are clearly detailed, and since there's not much time left to award the bounty, I'll award it now. If I have any questions while implementing this, I'll let you know. – Farbod Salamat-Zadeh Jul 01 '16 at 14:20
0

maybe try this : https://realm.io/products/java/ i never use it, i know nothing about their performance. It can be a way that can interest you.. or not ;)

Lucsartes
  • 321
  • 2
  • 13
  • It looks okay, but I don't want to spend lots of time implementing it into my app then finding out there is little performance improvement. – Farbod Salamat-Zadeh Jun 30 '16 at 17:36
  • Yeah, i can understand. I have a other solution, but i don't know (again ^^) if you can impement that. Imagine you must join on 100 rows, il you split you sql in 10 subQuery (join first 10, next 10, next 10, etc.), you can maybe launch 10 asynch request. It will be 10* faster that one request. And at the end, merge the 10 results arrays. – Lucsartes Jul 04 '16 at 15:59
  • Thanks for the alternative solution. I've decided to use _Danail_'s solution above, but asynchronously adding rows to the database doesn't sound too bad either. – Farbod Salamat-Zadeh Jul 04 '16 at 16:03