1

I am currently working on a program which allows a user to search through a very large collection (~100,000 objects) of trading cards and select cards of their choice to add to a deck file.

My question is, what is the most efficient way to store these objects for optimal search time? I need to be able to search each object for multiple possible values( card information fields such as name, type, rules text, etc) which match the given search string input and return all cards which match the search string.

Any suggestions will be appreciated.

Ratu
  • 29
  • 5
  • 1
    Embedded database? Something like HSQL or SQLite. – Thilo Sep 11 '13 at 02:44
  • A database for all the fields and then an [inverted index](http://en.wikipedia.org/wiki/Inverted_index) for searching. – Sotirios Delimanolis Sep 11 '13 at 02:45
  • @Thilo - What about `Derby` (aka `JavaDB`)? – PM 77-1 Sep 11 '13 at 02:46
  • @Thilo - How do I go about embedding a database? I've not worked with them much before and everything I've read points to having the database on a separate server? – Ratu Sep 11 '13 at 02:46
  • 1
    Embedded mean it's a file on the local disk to store the database and a jar file contained in your application to access that file. No separate server (or even process). Check out the introduction articles to Derby(JavaDB) or HSQL. – Thilo Sep 11 '13 at 02:48
  • 1
    Rundown of popular options: http://stackoverflow.com/questions/57102/embedded-java-databases – Thilo Sep 11 '13 at 02:49
  • Awesome, thank you. I will have a look at those. If anyone else has other possible solutions, please feel free to share. – Ratu Sep 11 '13 at 02:50
  • 1
    Use a embedded database you can use `lucene` with – nachokk Sep 11 '13 at 03:06
  • 1
    The three answers (as I write this) are all valid approaches, you just have to decide whether you want to build or buy. Also the different approaches will scale differently. It's all good, but you need to work through your other requirements, too. – Chris Gerken Sep 11 '13 at 15:32

3 Answers3

2

I'd look at Elasticsearch (my preference) or mongoDB. Both are json document stores optimized for both search and easy storage. They're both open source projects with easy-to-use java clients API's. It should be an easy step to store your card data as a JSON object and then persist those objects in either store.

Chris Gerken
  • 16,221
  • 6
  • 44
  • 59
2

Since you're allowing search based on text -- and I'm guessing this means they can match a substring rather than having to specify the whole string to match -- it's a much more difficult problem than if every field had a well-defined set of possible values. If it were playing cards, the suit would be one type and the other would be rank. In that case you could maintain a set of cards for each attribute value. Like, Set<Card> hearts, Set<Card> clubs, Set<Card> threes, etc. If there are any fields of that sort where you could have them select from a drop-down list that would greatly reduce search time. (Give me the set of sixes intersect the set of clubs, now search those for text that matches X.)

For the fields that do need to be text-searchable, it may be a good idea to keep an index based on each word in the value. For instance, if one card's "character" value contained "Luke Skywalker, Jedi Knight" (Card id 96) and another was "Mace Windu, Jedi Master" (Card id 97) then you would keep a data structure somewhat like this:

Map<String, Set<Cards>> characterTerms

with (K,V) entries like this:

"luke" -> [96]
"skywalker" -> [96]
"jedi" -> [96,97]
"knight" -> [96]
"mace" -> [97]
"windu" -> [97]
"master" -> [97]

Then when a search is submitted for "Skyw*" you could iterate over the keys in the characterTerms map to see which one have substrings of "skyw". In this case, the second entry. So you take that set of cards and see which ones match the rest of the criteria specified.

A good library for doing this sort of full-text search is Apache Lucene.

UFL1138
  • 632
  • 3
  • 10
  • 1
    Rather than iterate over them, sort the keys in `characterTerms`. If you want to support matching "ace" to "mace", it can be a slower secondary search. – clwhisk Sep 11 '13 at 03:47
  • Good point. I was assuming you'd always be doing a partial match. – UFL1138 Sep 11 '13 at 14:27
1

Use an embedded H2 database engine to hold and search your cards.

http://www.h2database.com/

tbsalling
  • 4,477
  • 4
  • 30
  • 51