5

I have a Java program that needs to iterate through a HashMap to get a parameters that are then used to query the MySQL database.

The code is as follows:

Iterator<Entry<String, Double>>it = ws.entrySet().iterator();
Connection con = null;

while(it.hasNext())  
{
    Entry<String, Double>pairs = it.next();
    PreparedStatement ps = con.prepareStatement("select doc_freq from lookup where word=?");
    ps.setString(1, pairs.getKey());
    ResultSet rs = ps.executeQuery();
}

The process of repeatedly accessing the database for every iteration of the loop (which is about 500 times) is slowing down my application. Is there any way I can send all these parameters at once so that I access the database only once?

Roman C
  • 49,761
  • 33
  • 66
  • 176
jayanth
  • 110
  • 1
  • 9
  • You could create a temporary table and save the `pairs.getKey()` values inside it, then use a `JOIN` statement bewteen your table and your temporary table to retrieve the data. Of course, don't forget to delete the temporary table when finish using it (just in case the RDBMS doesn't handle it for you). – Luiggi Mendoza Apr 05 '13 at 19:18
  • Is the query slow? Where your processing loop spend most of the time? – dcernahoschi Apr 05 '13 at 19:25
  • 1
    Read this: [Batching Select Statements in JDBC](http://www.javaranch.com/journal/200510/Journal200510.jsp#a2) – informatik01 Apr 05 '13 at 21:23

3 Answers3

1

Considering ws is a map, you can do a single query that way:

Connection con = getConnection();
Set<String> ks = ws.keySet();

if (ks.size() > 0) {
    StringBuilder inStatement = new StringBuilder("?");
    for (int i = 1; i < ks.size(); i++) {
        inStatement.append(", ?");
    }

    PreparedStatement ps = con.prepareStatement("select doc_freq from lookup where word in (" + inStatement.toString() + ")");

    int k = 1;
    for (String key : keySet) {
        ps.setString(k++, key);
    }
    ResultSet rs = ps.executeQuery();
}
ngasull
  • 4,206
  • 1
  • 22
  • 36
  • RDBMS have slow performance on this kind of queries. It is always better to use `id = ? or id = ?` concatenation. Still, a `JOIN` is much better that all these alternatives. – Luiggi Mendoza Apr 05 '13 at 20:04
  • Aren't `id = ? or id = ?` and `id in (?, ?)` equivalent? Anyway I don't see how a join would help... – ngasull Apr 05 '13 at 21:34
  • Could you give me a reference please? I'm quite interested! – ngasull Apr 05 '13 at 21:35
  • Based on [this Q/A](http://stackoverflow.com/q/3074713/1065197), `IN` seems to be the best way for MySQL, but not for Oracle. There are lot of discussions about this topic on the net (just search *sql in vs or*) so the only answer I can decide is: make your own tests and decide by yourself. Side note: when I worked with DB2 on AS400 last year, I affronted this case, and it result on `OR` owning `IN` clause. – Luiggi Mendoza Apr 05 '13 at 21:39
  • Thanks, I didn't know MySQL wasn't doing an optimization over this. Seems to be the same for oracle tought. – ngasull Apr 05 '13 at 21:51
  • Thank you, I tried this method and it has reduced my running time from minutes to seconds. – jayanth Apr 06 '13 at 09:07
0

Prepare the statement once then iterate and set parameter to it then execute. This is from javadoc

A SQL statement with or without IN parameters can be pre-compiled and stored in a PreparedStatement object. This object can then be used to efficiently execute this statement multiple times. This method is optimized for handling parametric SQL statements that benefit from precompilation. If the driver supports precompilation...

Iterator<Entry<String, Double>>it = ws.entrySet().iterator();
Connection con = getConnection();

PreparedStatement ps = con.prepareStatement("select doc_freq from lookup where word=?");
while(it.hasNext())  
{
    Entry<String, Double>pairs = it.next();
    ps.setString(1, pairs.getKey());
    ResultSet rs = ps.executeQuery();
}

When the query is compiled to the db it makes it faster to execute and retrieve results one by one, this is the same as executing one query with multiple parameters. performance is equal.

But if you prepare, compile the same SQL query multiple types the db will create the execution plan each time you compile the query this is time offensive.

That's why this techique called efficient in the docs. The term is also know as explain plan and it's created by the db to better optimize the query.

Roman C
  • 49,761
  • 33
  • 66
  • 176
0

It isn't clear from the question why it's slow but one usual problem is the overhead of each transaction.

You can set innodb_flush_log_at_trx_commit to 0 or 2 and everything will go faster. Be warned though that the default value of 1 is the only ACID compliant setting. For most setups 2 is a perfectly fine value.

set global innodb_flush_log_at_trx_commit = 2;
Andreas Wederbrand
  • 38,065
  • 11
  • 68
  • 78