0

This is a general question as to what kind of list needs to be used to accumulate data while looping through a JDBC ResultSet object.

I work for a project where certain queries return thousands of records. SQL queries are executed using JDBC methods and the resultant values are looped through a ResultSet. The SQL query produces a sorted order.

List<Company> list = new ArrayList<>();
ResultSet rs = statement.execute();
while (rs.next()){ 
 Company company = new Company();
 rs.getString("company_name");
 rs.getInteger("company_id");
 list.add(company);
}

This list is returned to the service layer and data binding is done through JAXB or Jackson ObjectMapper to convert it to an XML format or a JSON object and returned as a part of REST call.

There are ~200 such code snippets where an ArrayList is used to accumulate the resultant data.

Query: Since the list is getting structurally modified with every addition of a new data item, will using a LinkedList instead of ArrayList make any difference? Which one of it can be used to improve the performance?

Note that the list is not manipulated else where in the application. Data is inserted in the data layer and is transformed in the service layer.

Sara
  • 603
  • 8
  • 19
  • linkedList uses doubly linked list so u iterate in the forward and backward direction. And while using arrayList u can only iterate in the forward direction. So , depending on your requirement u can decide which one to use. – Karamveer Gahlot Sep 18 '21 at 10:02
  • @KaramveerGahlot I have clearly mentioned my requirement in the post. Apart from the basic difference between these two kinds of lists, I am looking at an answer from a performance perspective. – Sara Sep 18 '21 at 10:08
  • 1
    Please refer to this question. https://stackoverflow.com/questions/322715/when-to-use-linkedlist-over-arraylist-in-java?rq=1 – Karamveer Gahlot Sep 18 '21 at 10:09
  • 3
    LinkedList is generally slower and more memory heavy in near every use case. ArrayList does not get structurally modified each iteration - otherwise `add` would be `O(n)` not amortised `O(1)`. It’s very likely that LinkedList will be slower - but the only way to know would be to **benchmark**. – Boris the Spider Sep 18 '21 at 10:15
  • 2
    @KaramveerGalot your statement is obviously untrue. Of course you can iterate an array in both directions!! – Boris the Spider Sep 18 '21 at 10:18
  • @BoristheSpider Benchmark is usually done by the architects or other senior personnel in my organization. I do not have any authority on it, esp on production server. I'll still try to use it in my dev environment to atleast get a gist of it. Thank you for the idea. – Sara Sep 18 '21 at 10:28
  • 1
    If you are interested in performance, you should consider using JDBC rather than list types. [Improve JDBC Performance](https://stackoverflow.com/questions/25850569/improve-jdbc-performance) –  Sep 18 '21 at 10:46
  • @saka1029 adding records to a map seems more sensible. I could use that, perhaps. Also, there are certain queries which do not have any uniquely identifying column value - in such cases, list is the option or could I use a integer incrementor and still use a map? – Sara Sep 18 '21 at 11:23
  • 1
    Given that the data is being read from a DB (using IO) and then returned as an HTTP result (using IO) the intermediate in-memory structure chosen probably has little impact on performance. If you are returning thousands of records, consider using paging, or perhaps the client can search more narrowly. Perhaps you could stream directly from the ResultSet to JSON. – tgdavies Sep 18 '21 at 11:56
  • @sara A Map is useful if you want to look by using a key that is either non-sequential or not an integer. For example, a Map would be useful if you were getting employee records from the database and wanted to access records in your application layer by an employee identifier. If you are sequentially iterating over the records, a list is appropriate. – Jim D Sep 18 '21 at 11:57
  • @tgdavies stream directly from the ResultSet to JSON - Are you referring to use a Web Socket? – Sara Sep 18 '21 at 14:25
  • Just rendering each row in the ResultSet directly to the Response's OutputStream, with no intermediate storage. – tgdavies Sep 18 '21 at 21:26

1 Answers1

2

ArrayList is likely the best choice for this use case.

If you need to squeeze more performance, you should use the profiler to determine the bottleneck. I would bet money that increasing the fetch size would have the biggest impact on improving performance. See https://docs.oracle.com/cd/A87860_01/doc/java.817/a83724/resltse5.htm

The comment from @tgdavies about streaming directly to JSON is an excellent one. Avoiding the creation of an intermediate data structure would have a significant performance improvement. The downside is that this would be an API change for your application, which may be a bureaucratic process that takes a long time to coordinate.

Another option is preallocating the ArrayList based on the average number of records returned, but you should definitely only do that if the profiler identifies that as an issue.

Jim D
  • 326
  • 1
  • 7
  • does streaming directly to JSON mean using a web socket? – Sara Sep 18 '21 at 14:26
  • I was thinking more along the lines of implementing a Stream interface. Alternatively you could have some sort of functional interface (perhaps a Consumer from java.util.function) where a consumer is provided to the database fetch method. In fact, you are almost there with your current implementation. If you define a method with a consumer interface, you could even pass a List object's add method to it. Your current method would create a list, pass the add object's add method to your new method. See https://docs.oracle.com/javase/tutorial/java/javaOO/methodreferences.html – Jim D Sep 18 '21 at 14:59