1

I have a table with around 50 million rows.

Tablename: iddetails

Columns: nid, mid, pid, cid

Unique Key : Combination of mid, pid, cid columns

I need to load this data unto java application and perform search operations

My approach: Represent the data as a List of Maps.

List<Map<String, Long>> mList = new ArrayList<>();

To Search for any of mId, pId, cId and retrieve the nId

for (Map<String, Long> mp : mList) {
if(mp.get("pId")==99999) {
    System.out.println("nId : "+mp.get("nId"));
    System.out.println("mId : "+mp.get("mId"));
    System.out.println("pId : "+mp.get("pId"));
    System.out.println("cId : "+mp.get("cId"));
    break;
}
}

This solution is working.

But I want to know is: Are there any better approaches than this, performance wise.

Edit: nId, instead of name.

MChiru
  • 79
  • 7
  • Why don't you write a query with a `WHERE pid = 99999` ? That would be the most efficient way rather than loading the whole database from your code... – assylias Oct 31 '19 at 11:38
  • I need to process large volumes of records and have to fill the missing fields by getting this data. I feel, if you have sufficient memory, search of local data is better than a dB query for each record – MChiru Oct 31 '19 at 12:54
  • In most situations, it's better to use queries to do database stuff rather than load the data, manipulate it and write it back to the DB. For example you mention filling the missing fields - you may be able to do that with a query (depending on your DB) which will most likely be much more efficient. – assylias Oct 31 '19 at 19:34

3 Answers3

1

In you example you are looping over 50 mln list entries when searching, which is definitely not the fastest approach. If you understand how the data is accessed you can speed the process up, just like you would normally do by adding the right index to a database table.

For example if you know that you will be searching by the pId column often you could replace the List with another Map where the key is an pId.

Map<Long, Map<String, Long>> data = ...
Map<String, Long> mp = data.get(99999);
System.out.println("Name : " + mp.get("name"));
System.out.println("mId : " + mp.get("mId"));
System.out.println("pId : " + mp.get("pId"));
System.out.println("cId : " + mp.get("cId"));

Take a look at this question where you will find few more solutions.

Karol Dowbecki
  • 43,645
  • 9
  • 78
  • 111
1

It's not working: name is a String, not a long. You'd be much better off defining a class to represent the rows:

public class Row {
    private String name;
    private long mId;
    private long pId;
    private long cId;
}

and a Map<Long,Row> to access a row by pId. That said, you should consider leaving your data in the DB and accessing it with JDBC.

Maurice Perry
  • 9,261
  • 2
  • 12
  • 24
  • I need to process large volumes of records and have to fill the missing fields by getting this data. I feel, if you have sufficient memory, search of local data is better than a dB query for each record. – MChiru Oct 31 '19 at 12:53
  • As mentioned in the question, Search can be based on mId, pId or cId. – MChiru Oct 31 '19 at 13:02
  • @MChiru what you're doing is the work of a dbms, As for the field used for search, you will need a map for each of these fields. – Maurice Perry Oct 31 '19 at 13:05
  • I was thinking along these lines: 3 Maps for mId, pId and cId. Shall try with this option. – MChiru Oct 31 '19 at 13:09
  • @MChiru but note that if one of these fields is not unique, you will need a `Map>` – Maurice Perry Oct 31 '19 at 13:13
  • Yes @Maurice Perry. Have accounted for that. mId, pId and cId are unique. – MChiru Oct 31 '19 at 13:26
  • My apologies for marking this answer as accepted, so late. I had followed your suggestion. Created 3 Maps for mId, pId and cId. Loading the Maps at the startup, around 1 Million records. Data gets updated once every day, when no operations are done. I've compared this approach with JDBC search, but I felt for this use case, the above approach is fine and am going with it. – MChiru May 24 '20 at 14:47
1

I still think it is a better approach to search the DB directly than loading the data into the application and then searching it in memory.

Firstly you'll need to have a lot of space if you want to load all of your 50 million records at once or most of the records at once. And if you cannot load all of the records at once then it will add more overhead.

And second thing is why would you want to load all the records at once? If you are not going to use all the loaded records then the time taken to load all those unwanted records is wasted. You can know about the pros and cons of that approach here

So it is a better approach to just query the database with the given fields. And if you think there would be too many db hits then maybe generalize the query a bit more and get a bit more of data than strictly needed. Also if you index the database fields properly the searching will be very fast.

So considering your database grows and memory will be at short most of the time, I recommend querying the DB directly. But if you really want to use your approach then just post your solution afterwards.

  • Thanks Nandan.. My apologies for responding so late. Created 3 Maps for mId, pId and cId. Loading the Maps at the startup, around 1 Million records. Data gets updated once every day, when no operations are done. I felt for this use case, the above approach suggested by @Maurice Perry is fine and am going with it. – MChiru May 24 '20 at 14:50