0

I have an application that fetches a large dataset from a SQLite DB using Java JDBC. (Note that I'm running on a small scale system - Atom processor, 1GB ram, SQLite DB) Maybe about 160,000 points of data. The data in each row is a timestamp (INTEGER), VALUE (REAL) returned from the query.

I need to do processing on this via another thread. so, I put it into a List(HashMap(String, Object)) for easy usage via the below function. This is passed to the other thread for processing. The creation of this List(HashMap)() seems to take quite some time to process, roughly 20 seconds per query.

Is there a more efficient way to do this?

public List<HashMap<String, Object>> resultSetToArrayList(ResultSet rs) throws SQLException
{
    ResultSetMetaData md = rs.getMetaData();
    int columns = md.getColumnCount();
    List<HashMap<String, Object>> list = new ArrayList<HashMap<String, Object>>();

    while (rs.next())
    {
        HashMap<String, Object> row = new HashMap<String, Object>(columns);
        for (int i = 1; i <= columns; ++i)
        {
            row.put(md.getColumnName(i), rs.getObject(i));
        }
        list.add(row);
    }
    return list;
}

After comments below I have improved this to use an object "Result". The improved function is below as well as the object definition. These changes resulted in an 8 second processing time improvement.

public List<Result> resultSetToArrayList(ResultSet rs) throws SQLException
{
    List<Result> list = new ArrayList<Result>();

    while (rs.next())
    {
        list.add(new Result(rs.getLong(1), rs.getObject(2)));
    }
    return list;
}

public class Result

{

long timestamp;
Object value;

public Result(long timestamp, Object value)
{

    this.timestamp = timestamp;
    this.value = value;
}

public long getTimestamp()
{
    return timestamp;
}

public void setTimestamp(long timestamp)
{
    this.timestamp = timestamp;
}

public Object getValue()
{
    return value;
}

public void setValue(Object value)
{
    this.value = value;
}
Tacitus86
  • 1,314
  • 2
  • 14
  • 36
  • So it takes 20 seconds to retrieve 160k rows of data. That doesn't sound awfully bad. What kind of throughput are you expecting? – Mick Mnemonic Jul 06 '17 at 19:28
  • No, the query itself receives the result set in a second or two, but reading the result set and putting into a List> takes 20 seconds. I think that part is excessive and should be minimal in the amount of time it takes to process. Looking for suggestions to improve efficiency in creating that storage for the result set. Different data structures? a more efficient way of populating from the result set? Anything to reduce that time. – Tacitus86 Jul 06 '17 at 19:30
  • I would try to determine and then set an initial size: https://stackoverflow.com/questions/15430247/why-start-an-arraylist-with-an-initial-capacity – Marged Jul 06 '17 at 19:35
  • 2
    Don't use a `HashMap` to store the row as name/value pairs. Java is an Object-Oriented language. Use it. Create a class with fields matching the result columns, then build a list of those. Using a `HashMap` is a humongous waste of memory. – Andreas Jul 06 '17 at 19:40
  • Andreas, that should be an easy change. I will try it. – Tacitus86 Jul 06 '17 at 19:42
  • @Marged Setting the initial size has virtually no impact on performance here. – Andreas Jul 06 '17 at 19:43
  • @Andreas Why ? He is creating a 160K list that needs to be grown several times. That can be eliminated. – Marged Jul 06 '17 at 19:55
  • @Andreas, Doing your suggestion has reduced the processing time to 12 seconds from the original 20. Anything else we can do? – Tacitus86 Jul 06 '17 at 19:58
  • @Marged, would save me more time than I would lose iterating through the resultset once to find the length? – Tacitus86 Jul 06 '17 at 19:59
  • @tacitus86 we don't know your database and select, determining the size can be as easy as `select count(*)`. And please specify what can be altered in your code, nobody knows what the receiving function expects. My first assumption was that the `List` returned is compulsory. Perhaps your algorithm can even be expressed in SQL ;-) – Marged Jul 06 '17 at 20:00
  • @Marged The issue with this, is it seems any db interaction on this system because it is so low end, the DB so large uses massive amounts of system resources to accomplish. I could call a count but it would be just as taxing as calling the original query again. I'm unsure what you mean by "what the receiving function expects"? I mentioned in the OP that it will always be a resultSet containing X number of rows with columns long timestamp and Object value. Hope that answers what you are asking. – Tacitus86 Jul 06 '17 at 20:06
  • @Andreas I have added my changed function to the OP. – Tacitus86 Jul 06 '17 at 20:09
  • @Marged Creating 160K `HashMap`, adding 320K entries to them (2 per), takes a hell of a lot longer than re-sizing the list a few times (14 times maybe?). I don't think you could measure the difference, even if you tried. – Andreas Jul 06 '17 at 20:38
  • @Tacitus86 Don't use `Object` as second value. You said it was a SQLite `REAL`, which is a Java `double`. Use that. `Object` means that it would be a `Double`, i.e. a boxed `double`, taking up a lot more space. This assumes the value cannot be `NULL`, of course. – Andreas Jul 06 '17 at 20:42
  • @Marged Sorry, 14 resizes was wrong. With initial size of 10, and growth of 50%, it'll take 24 resizes to get to capacity of 160065. – Andreas Jul 06 '17 at 20:50
  • I will try both suggestions when i get to the office tomorrow to see if there is noticeable improvement. I was really hoping this would take on the order of 1 second or less tbh. – Tacitus86 Jul 06 '17 at 20:59
  • @Andreas I wrote a small micro benchmark that simply adds 160K `Integer`s to a list and the difference can be measured: about 20%. But the overall processing time is well below a second, so I think we can forget about this Suggestion ;-). I was expecting more gain. But anyway my first understanding was that the way the data is stored must not be changed. And because it *can* be your approach is the much better one. – Marged Jul 06 '17 at 21:09
  • @Marged Thanks for checking. Yeah the data can be changed. As long as it comes in as a resultset and exits in some useful object form that i can access stuff, its all open. – Tacitus86 Jul 06 '17 at 21:12
  • @Tacitus86 I don't know if SQLite has comparable problems, but Oracle needs to have its fetch size tuned. SQLite is no client server database but perhaps even here you can reduce the amount of roundtrips. Just a guess. – Marged Jul 06 '17 at 21:12
  • I think it would be interesting what your "other thread" does with the data. Perhaps that processing could be turned into a SQL statement – Marged Jul 06 '17 at 21:14
  • Scales the result data down to a requested number of min/max samples over equal time periods in additon to publishing the scaled values to RTIs DDS. – Tacitus86 Jul 06 '17 at 21:17
  • Ive tried moving the heaving lifting to sql and it actually took longer on this database believe it or not. – Tacitus86 Jul 06 '17 at 21:24
  • Tried these today and didn't get any noticeable difference with either solution of setting the array size or changing the type of value from Object to double (possibly a small memory improvement here so I'll leave it). I did notice one other thing, when I run with 70k points, this runs in 1 or 2 seconds, whereas with 130-170 it takes like 15 seconds. Why such a huge increase? – Tacitus86 Jul 07 '17 at 17:33

0 Answers0