65

I'm looking for the most basic solution to create multiple indexes on a Java Collection.

Required functionality:

  • When a Value is removed, all index entries associated with that value must be removed.
  • Index lookup must be faster than linear search (at least as fast as a TreeMap).

Side conditions:

  • No dependencies on large (like Lucene) libraries. No uncommon or not well tested libraries. No database.
  • A library like Apache Commons Collections etc. would be ok.
  • Even better, if it works with JavaSE (6.0) alone.
  • Edit: No self-implemented solution (thanks for the answers suggesting this - it's good to have them here for completeness, but I already have a solution very similar to Jay's) Whenever several people find out, that they implemented the same thing, this should be part of some common library.

Of course, I could write a class that manages multiple Maps myself (that's not hard, but it feels like reinventing the wheel). So I'd like to know, if it can be done without - while still getting a simple usage similar to using a single indexed java.util.Map.

Thanks, Chris

Update

It looks very much as if we haven't found anything. I like all your answers - the self developed versions, the links to database-like libraries.

Here's what I really want: To have the functionality in (a) Apache Commons Collections or (b) in Google Collections/Guava. Or maybe a very good alternative.

Do other people miss this functionality in these libraries, too? They do provide all sorts of things like MultiMaps, MulitKeyMaps, BidiMaps, ... I feel, it would fit in those libraries nicely - it could be called MultiIndexMap. What do you think?

Chris Lercher
  • 37,264
  • 20
  • 99
  • 131
  • I do not fully understand what you mean by "multiple indexes". Do you want several different lookup methods - one per index - or do you want that e.g. HashMap uses a new Collection (e.g. a hashmap) for those entries with the same hash key? – Thorbjørn Ravn Andersen Mar 26 '10 at 17:10
  • It should be similar to what you can do with a database table: One primary key (e.g. `user_id`), plus other unique indexes (e.g. on `username`). Both can be used to access a user row. When the user is deleted, then the id and the username are removed from the indexes together with the entry. – Chris Lercher Mar 26 '10 at 17:55
  • you just defined a in memory database, which your requirements state you don't want. –  Mar 29 '10 at 20:25
  • 1
    Related - "How to implement a Map with multiple keys?" http://stackoverflow.com/questions/822322/how-to-implement-a-map-with-multiple-keys. No clear solutions there either - it looks like some of the answers are for using a map with a composite key, but the OP there indicated he wants to be able look up by a single-value key. Nevertheless, seems like a good idea to link the questions in the unlikely event a viable answer turns up over there later. Also, if the OP here decides on a custom solution, this related question provides some additional ideas on potential implementations. – Bert F Apr 02 '10 at 13:44
  • When a Value is removed, all index entries associated with that value must be removed. CHECK! Boon Repo does this. Index lookup must be faster than linear search (at least as fast as a TreeMap). CHECK Boon Repo does this! Side conditions: No dependencies on large (like Lucene) libraries. No uncommon or not well tested libraries. No database. Boon depends on nothing! CHECK! Boon Repo does all of this. And a lot more, and it is small and tight. :) http://richardhightower.github.io/site/Boon/Welcome.html – RickHigh Nov 02 '13 at 11:16
  • @ChrisLercher: Being a frequent (ab)user of the [Boost Multi-Index Container](http://www.boost.org/doc/libs/1_55_0/libs/multi_index/doc/index.html), not having that capability in Java seems quite crippling. In the (almost) three years since this question was first posed, has any decent/viable options emerged that are reasonably stable? (continued...) – decimus phostle Jan 22 '14 at 19:34
  • I have briefly looked into this and my findings are: i) [CQEngine](https://code.google.com/p/cqengine/) seems to be the most promising at this time. ii) [Guava's Table](https://code.google.com/p/guava-libraries/wiki/NewCollectionTypesExplained#Table) abstraction seems a bit limited. (continued...) – decimus phostle Jan 22 '14 at 19:36
  • iii) Neither did using filtering(say with [JFilter](https://code.google.com/p/jfilter/) or with [Guava Predicates](https://code.google.com/p/guava-libraries/wiki/FunctionalExplained#Predicates)) or using SQL-like([JoSQL](http://josql.sourceforge.net/)) abstractions seem as appealing. iv) [Boon](http://richardhightower.github.io/site/Boon/Welcome.html) seems promising, but at this time seems like a work-in-progress compared to CQEngine. (continued...) – decimus phostle Jan 22 '14 at 19:37
  • Please let me know if you or anyone else has any new insights/knowledge of alternate/better solutions. Thanks. – decimus phostle Jan 22 '14 at 19:37

15 Answers15

24

Each index will basically be a separate Map. You can (and probably should) abstract this behind a class that manages the searches, indexing, updates and removals for you. It wouldn't be hard to do this fairly generically. But no, there's no standard out of the box class for this although it can easily be built from the Java Collections classes.

cletus
  • 616,129
  • 168
  • 910
  • 942
  • 1
    A Map is tricky though when the indexed values (the key of the map) can be double, e.g. are not unique. Then you'd need Maps with Lists inside. It's the same problem with TreeSets with custom Comparators which was what I was thinking of- in implementation also a TreeMap :) – extraneon Mar 23 '10 at 16:10
  • Good answer. It wouldn't be too hard to create a library class with createIndex/destroyIndex where you pass createIndex a class that does nothing but implement equals and hash. You'd have to store an array of everything that's ever been added so that when someone called createIndex you could quickly add everything in your array into that new hash. destroyIndex would just delete the hash. Meh, doesn't seem worth it. I guess that's why it's never been done. – Bill K Mar 23 '10 at 16:10
  • @cletus: Also not in any (small, common and well tested) library? – Chris Lercher Mar 23 '10 at 16:11
  • 3
    @extraneon: The functionality "Multiple values for one key" is provided by Apache Commons Collections MultiMap https://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/MultiMap.html – Chris Lercher Mar 23 '10 at 16:15
  • The multiple values per key problem can be solved in a number of ways: 1. use a multimap type data structure (requires external library like Google Collections) 2. Store buckets (probably a List) at each value or 3. use an alternate structure like a sorted List. – cletus Mar 23 '10 at 23:58
  • 1
    Sets (probably HashSets) are better than Lists for the buckets. You don't need order, and probably do need fast insertion, removal, and uniqueness checking. – TREE Mar 29 '10 at 15:12
  • @cletus: You provided the first answer. At first I doubted it (can't be true ;-), but now I believe it. – Chris Lercher Apr 02 '10 at 16:28
13

Take a look at CQEngine (Collection Query Engine), it's an exact fit for this kind of requirement, being based around an IndexedCollection.

Also see related question How do you query object collections in Java (Criteria/SQL-like)? for more background.

Community
  • 1
  • 1
npgall
  • 2,979
  • 1
  • 24
  • 24
8

My first thought would be to create a class for the thing being indexed, then create multiple HashMap's to hold the indexes, with the same object added to each of the HashMaps. For an add, you'd then simply add the same object to each HashMap. A delete would require searching each HashMap for the reference to the destination object. If deletes need to be fast, you might want to create two HashMaps for each index: one for index-to-value and the other for value-to-index. Of course I'd wrap whatever you do in a class with a clearly-defined interface.

Doesn't seem like this would be hard. If you know the numbers and types of the indexes and the class of the widget up front, it would be pretty easy, like:

public class MultiIndex
{
  HashMap<String,Widget> index1=new HashMap<String,Widget>();
  HashMap<String,Widget> index2=new HashMap<String,Widget>();
  HashMap<Integer,Widget> index3=new HashMap<Integer,Widget>();

  public void add(String index1Value, String index2Value, Integer index3Value, Widget widget)
  {
    index1.put(index1Value, widget);
    index2.put(index2Value, widget);
    index3.put(index3Value, widget);
  }
  public void delete(Widget widget)
  {
    Iterator i=index1.keySet().iterator(); 
    while (i.hasNext())
    {
      String index1Value=(String)i.next();
      Widget gotWidget=(Widget) index1.get(index1Value);
      if (gotWidget.equals(widget))
        i.remove();
    }
    ... similarly for other indexes ...
  }
  public Widget getByIndex1(String index1Value)
  {
    return index1.get(index1Value);
  }
  ... similarly for other indexes ...

  }
}

If you want to make it generic and accept any object, have variable number and types of indexes, etc., it's a little more complicated, but not much.

Jay
  • 26,876
  • 10
  • 61
  • 112
5

You need to check out Boon. :)

http://rick-hightower.blogspot.com/2013/11/what-if-java-collections-and-java.html

You can add n number of search indexes and lookup indexes. It also allows you to efficiently query primitive properties.

Here is an example take from the wiki (I am the author).

  repoBuilder.primaryKey("ssn")
          .searchIndex("firstName").searchIndex("lastName")
          .searchIndex("salary").searchIndex("empNum", true)
          .usePropertyForAccess(true);

You can override that by providing a true flag as the second argument to searchIndex.

Notice empNum is a searchable unique index.

What if it were easy to query a complex set of Java objects at runtime? What if there were an API that kept your object indexes (really just TreeMaps, and HashMaps) in sync.? Well then you would have Boon's data repo. This article shows how to use Boon's data repo utilities to query Java objects. This is part one. There can be many, many parts. :) Boon's data repo makes doing index based queries on collections a lot easier. Why Boon data repo

Boon's data repo allows you to treat Java collections more like a database at least when it comes to querying the collections. Boon's data repo is not an in-memory database, and cannot substitute arranging your objects into data structures optimized for your application. If you want to spend your time providing customer value and building your objects and classes and using the Collections API for your data structures, then DataRepo is meant for you. This does not preclude breaking out the Knuth books and coming up with an optimized data structure. It just helps keep the mundane things easy so you can spend your time making the hard things possible. Born out of need

This project came out of a need. I was working on a project that planned to store large collection of domain objects in-memory for speed, and somebody asked an all to important question that I overlooked. How are we going to query this data. My answer was we will use the Collections API and the Streaming API. Then I tried to do this... Hmmm... I also tired using the JDK 8 stream API on a large data set, and it was slow. (Boon's data repo works with JDK7 and JDK8). It was a linear search/filter. This is by design, but for what I was doing, it did not work. I needed indexes to support arbitrary queries. Boon's data repo augments the streaming API.

Boon's data repo does not endeavor to replace the JDK 8 stream API, and in fact it works well with it. Boon's data repo allows you to create indexed collections. The indexes can be anything (it is pluggable). At this moment in time, Boon's data repo indexes are based on ConcurrentHashMap and ConcurrentSkipListMap. By design, Boon's data repo works with standard collection libraries. There is no plan to create a set of custom collections. One should be able to plug in Guava, Concurrent Trees or Trove if one desires to do so. It provides a simplified API for doing so. It allows linear search for a sense of completion but I recommend using it primarily for using indexes and then using the streaming API for the rest (for type safety and speed).

sneak peak before the step by step

Let's say you have a method that creates 200,000 employee objects like this:

 List<Employee> employees = TestHelper.createMetricTonOfEmployees(200_000);

So now we have 200,000 employees. Let's search them...

First wrap Employees in a searchable query:

   employees = query(employees);

Now search:

  List<Employee> results = query(employees, eq("firstName", firstName));

So what is the main difference between the above and the stream API?

  employees.stream().filter(emp -> emp.getFirstName().equals(firstName)

About a factor of 20,000% faster to use Boon's DataRepo! Ah the power of HashMaps and TreeMaps. :) There is an API that looks just like your built-in collections. There is also an API that looks more like a DAO object or a Repo Object.

A simple query with the Repo/DAO object looks like this:

  List<Employee> employees = repo.query(eq("firstName", "Diana"));

A more involved query would look like this:

  List<Employee> employees = repo.query(
      and(eq("firstName", "Diana"), eq("lastName", "Smith"), eq("ssn", "21785999")));

Or this:

  List<Employee> employees = repo.query(
      and(startsWith("firstName", "Bob"), eq("lastName", "Smith"), lte("salary", 200_000),
              gte("salary", 190_000)));

Or even this:

  List<Employee> employees = repo.query(
      and(startsWith("firstName", "Bob"), eq("lastName", "Smith"), between("salary", 190_000, 200_000)));

Or if you want to use JDK 8 stream API, this works with it not against it:

  int sum = repo.query(eq("lastName", "Smith")).stream().filter(emp -> emp.getSalary()>50_000)
      .mapToInt(b -> b.getSalary())
      .sum();

The above would be much faster if the number of employees was quite large. It would narrow down the employees whose name started with Smith and had a salary above 50,000. Let's say you had 100,000 employees and only 50 named Smith so now you narrow to 50 quickly by using the index which effectively pulls 50 employees out of 100,000, then we do the filter over just 50 instead of the whole 100,000.

Here is a benchmark run from data repo of a linear search versus an indexed search in nano seconds:

Name index  Time 218 
Name linear  Time 3709120 
Name index  Time 213 
Name linear  Time 3606171 
Name index  Time 219 
Name linear  Time 3528839

Someone recently said to me: "But with the streaming API, you can run the filter in parralel).

Let's see how the math holds up:

3,528,839 / 16 threads vs. 219

201,802 vs. 219 (nano-seconds).

Indexes win, but it was a photo finish. NOT! :)

It was only 9,500% faster instead of 40,000% faster. So close.....

I added some more features. They are make heavy use of indexes. :)

repo.updateByFilter(values(value("firstName", "Di")), and( eq("firstName", "Diana"), eq("lastName", "Smith"), eq("ssn", "21785999") ) );

The above would be equivalent to

UPDATE Employee e SET e.firstName='Di' WHERE e.firstName = 'Diana' and e.lastName = 'Smith' and e.ssn = '21785999'

This allows you to set multiple fields at once on multiple records so if you were doing a bulk update.

There are overloaded methods for all basic types so if you have one value to update on each items returned from a Filter:

  repo.updateByFilter("firstName", "Di",
          and( eq("firstName", "Diana"),
          eq("lastName", "Smith"),
                  eq("ssn", "21785999") ) );

Here is some basic selection capabilities:

  List <Map<String, Object>> list =
     repo.query(selects(select("firstName")), eq("lastName", "Hightower"));

You can have as many selects as you like. You can also bring the list back sorted:

  List <Map<String, Object>> list =
     repo.sortedQuery("firstName",selects(select("firstName")),
       eq("lastName", "Hightower"));

You can select properties of related properties (i.e., employee.department.name).

  List <Map<String, Object>> list = repo.query(
          selects(select("department", "name")),
          eq("lastName", "Hightower"));

  assertEquals("engineering", list.get(0).get("department.name"));

The above would try to use the fields of the classes. If you want to use the actual properties (emp.getFoo() vs. emp.foo), then you need to use the selectPropertyPath.

  List <Map<String, Object>> list = repo.query(
          selects(selectPropPath("department", "name")),
          eq("lastName", "Hightower"));

Note that select("department", "name") is much faster than selectPropPath("department", "name"), which could matter in a tight loop.

By default all search indexes and lookup indexes allow duplicates (except for primary key index).

  repoBuilder.primaryKey("ssn")
          .searchIndex("firstName").searchIndex("lastName")
          .searchIndex("salary").searchIndex("empNum", true)
          .usePropertyForAccess(true);

You can override that by providing a true flag as the second argument to searchIndex.

Notice empNum is a searchable unique index.

If you prefer or need, you can get even simple searches back as maps:

  List<Map<String, Object>> employees = repo.queryAsMaps(eq("firstName", "Diana"));

I am not sure if this is a feature or a misfeature. My thought was that once you are dealing with data, you need to present that data in a way that does not ties consumers of data to your actual API. Having a Map of String / basic types seems to be a way to achieve this. Note that the object to map conversion goes deep as in:

  System.out.println(employees.get(0).get("department"));

Yields:

{class=Department, name=engineering}

This can be useful for debugging and ad hoc queries for tooling. I am considering adding support to easily convert to a JSON string.

Added the ability to query collection properties. This should work with collections and arrays as deeply nested as you like. Read that again because it was a real MF to implement!

  List <Map<String, Object>> list = repo.query(
          selects(select("tags", "metas", "metas2", "metas3", "name3")),
          eq("lastName", "Hightower"));

  print("list", list);

  assertEquals("3tag1", idx(list.get(0).get("tags.metas.metas2.metas3.name3"), 0));

The print out of the above looks like this:

list [{tags.metas.metas2.metas3.name3=[3tag1, 3tag2, 3tag3,
3tag1, 3tag2, 3tag3, 3tag1, 3tag2, 3tag3, 3tag1, 3tag2, 3tag3,
3tag1, 3tag2, 3tag3, 3tag1, 3tag2, 3tag3, 3tag1, 3tag2, 3tag3,
3tag1, 3tag2, 3tag3, 3tag1, 3tag2, 3tag3, 3tag1, 3tag2, 3tag3,
3tag1, 3tag2, 3tag3, 3tag1, 3tag2, 3tag3, 3tag1, 3tag2, 3tag3,
3tag1, 3tag2, 3tag3, 3tag1, 3tag2, 3tag3, 3tag1, 3tag2, 3tag3,
3tag1, 3tag2, 3tag3, 3tag1, 3tag2, 3tag3, 3tag1, 3tag2, 3tag3,
3tag1, 3tag2, 3tag3, 3tag1, 3tag2, 3tag3, 3tag1, 3tag2, 3tag3,
3tag1, 3tag2, 3tag3, 3tag1, 3tag2, 3tag3, 3tag1, 3tag2, 3tag3,
3tag1, 3tag2, 3tag3, 3tag1, 3tag2, 3tag3]},
...

I created several relationship classes to test this:

public class Employee {
List <Tag> tags = new ArrayList<>();
{
    tags.add(new Tag("tag1"));
    tags.add(new Tag("tag2"));
    tags.add(new Tag("tag3"));

}
...
public class Tag {
...
List<Meta> metas = new ArrayList<>();
{
    metas.add(new Meta("mtag1"));
    metas.add(new Meta("mtag2"));
    metas.add(new Meta("mtag3"));

}

}
public class Meta {
 ...
   List<Meta2> metas2 = new ArrayList<>();
   {
       metas2.add(new Meta2("2tag1"));
       metas2.add(new Meta2("2tag2"));
       metas2.add(new Meta2("2tag3"));

   }

}

...
public class Meta2 {



List<Meta3> metas3 = new ArrayList<>();
{
    metas3.add(new Meta3("3tag1"));
    metas3.add(new Meta3("3tag2"));
    metas3.add(new Meta3("3tag3"));

}
public class Meta3 {

...

You can also search by type:

  List<Employee> results = sortedQuery(queryableList, "firstName", typeOf("SalesEmployee"));

  assertEquals(1, results.size());
  assertEquals("SalesEmployee", results.get(0).getClass().getSimpleName());

The above finds all employees with the simple classname of SalesEmployee. It also works with full class name as in:

  List<Employee> results = sortedQuery(queryableList, "firstName", typeOf("SalesEmployee"));

  assertEquals(1, results.size());
  assertEquals("SalesEmployee", results.get(0).getClass().getSimpleName());

You can search by the actual class too:

  List<Employee> results = sortedQuery(queryableList, "firstName", instanceOf(SalesEmployee.class));

  assertEquals(1, results.size());
  assertEquals("SalesEmployee", results.get(0).getClass().getSimpleName());

You can also query classes that implement certain interfaces:

  List<Employee> results = sortedQuery(queryableList, "firstName",      
                              implementsInterface(Comparable.class));

  assertEquals(1, results.size());
  assertEquals("SalesEmployee", results.get(0).getClass().getSimpleName());

You can also index nested fields/properties and they can be collection fields or property non collection fields as deeply nested as you would like:

  /* Create a repo, and decide what to index. */
  RepoBuilder repoBuilder = RepoBuilder.getInstance();

  /* Look at the nestedIndex. */
  repoBuilder.primaryKey("id")
          .searchIndex("firstName").searchIndex("lastName")
          .searchIndex("salary").uniqueSearchIndex("empNum")
          .nestedIndex("tags", "metas", "metas2", "name2");

Later you can use the nestedIndex to search.

  List<Map<String, Object>> list = repo.query(
          selects(select("tags", "metas", "metas2", "name2")),
          eqNested("2tag1", "tags", "metas", "metas2", "name2"));

The safe way to use the nestedIndex is to use eqNested. You can use eq, gt, gte, etc. if you have the index like so:

  List<Map<String, Object>> list = repo.query(
          selects(select("tags", "metas", "metas2", "name2")),
          eq("tags.metas.metas2.name2", "2tag1"));

You can also add support for subclasses

  List<Employee> queryableList = $q(h_list, Employee.class, SalesEmployee.class,  
                  HourlyEmployee.class);
  List<Employee> results = sortedQuery(queryableList, "firstName", eq("commissionRate", 1));
  assertEquals(1, results.size());
  assertEquals("SalesEmployee", results.get(0).getClass().getSimpleName());

  results = sortedQuery(queryableList, "firstName", eq("weeklyHours", 40));
  assertEquals(1, results.size());
  assertEquals("HourlyEmployee", results.get(0).getClass().getSimpleName());

The data repo has a similar feature in its DataRepoBuilder.build(...) method for specifying subclasses. This allows you to seemless query fields form subclasses and classes in the same repo or searchable collection.

RickHigh
  • 1,808
  • 20
  • 16
5

You have a lot of really constrictive requirements are appear to be very particular to your needs. Most of the things you are saying aren't viable are because a lot so of people have the same exact needs which basically defines a basic database engine. That is why they are "large" libraries. You say "no database" but at its core every indexing system is a "database" of terms and documents. I would argue that a Collection is a "database". I would say take a look at Space4J.

I would say if you don't find what you are looking for, start a project on GitHub and get on with coding it yourself and sharing the results.

  • 1
    That's a very good point: I need part of the functionality of a database in memory! I don't need ACID, I don't even need multiple tables, I don't need to access it using a comfortable query language. But I do need one piece of it: Multiple unique indexes. It's possible, that Space4J is currently the best answer to that. I must admit, that I haven't heard of that project before - +1 for the link! However, I do think, that the problem is very common, just as common as invoking a "CREATE UNIQUE INDEX ... ON ..." in SQL. – Chris Lercher Mar 29 '10 at 21:07
  • 3
    Space4J has a good approach to indexation, because even in memory you may want indexed searches when your table starts to grow. Take a look here: http://forum.space4j.org/posts/list/5.page – TraderJoeChicago Sep 08 '11 at 16:50
4

Google Collections LinkedListMultimap

About your first requirement

  • When a Value is removed, all index entries associated with that value must be removed.

I think There is neither a library nor a Helper that supports it.

Here is how i have done by using LinkedListMultimap

Multimap<Integer, String> multimap = LinkedListMultimap.create();

// Three duplicates entries
multimap.put(1, "A");
multimap.put(2, "B");
multimap.put(1, "A");
multimap.put(4, "C");
multimap.put(1, "A");

System.out.println(multimap.size()); // outputs 5

To get your first requirement, a Helper can play a good job

public static <K, V> void removeAllIndexEntriesAssociatedWith(Multimap<K, V> multimap, V value) {
    Collection<Map.Entry<K, V>> eCollection = multimap.entries();
    for (Map.Entry<K, V> entry : eCollection)
        if(entry.getValue().equals(value))
            eCollection.remove(entry);
}

...

removeAllIndexEntriesAssociatedWith(multimap, "A");

System.out.println(multimap.size()); // outputs 2

Google collections is

  • lightweight
  • Supported by Joshua Block (Effective Java)
  • Nice features as ImmutableList, ImmutableMap and so on
Arthur Ronald
  • 33,349
  • 20
  • 110
  • 136
2

I've written a Table interface that includes methods like

V put(R rowKey, C columnKey, V value) 
V get(Object rowKey, Object columnKey) 
Map<R,V> column(C columnKey) 
Set<C> columnKeySet()
Map<C,V> row(R rowKey)
Set<R> rowKeySet()
Set<Table.Cell<R,C,V>> cellSet()

We'd like to include it in a future Guava release, but I don't know when that would happen. http://code.google.com/p/guava-libraries/issues/detail?id=173

Jared Levy
  • 1,986
  • 13
  • 12
  • @Jared Levy: Thanks, good to hear, that Guava is considering this topic! I'm considering to switch from Apache Commons Collections to Guava :-) Can you expand a little bit on the usage of the table interface? At first glance, it looks more like it provides something similar to what a composite primary key for a database table does (where you have to get *both* parts of the key right to access the value)? Or does it also provide multiple indexes (where you choose *one* part of the key, and then only need to get that part right to access the value)? – Chris Lercher Mar 28 '10 at 13:51
  • You can specify 0, 1, or 2 keys when accessing a Table. Unfortunately, it might be a while until we open-source it. – Jared Levy Mar 28 '10 at 18:34
  • That's a nice data structure, which I believe will show its strength especially, when applied recursively (tables in tables)! It seems to be different from the structure I'm looking for in this question, though. Jay's example is what I'm basically looking for (multiple *unique* indexes; delete an element, and it's deleted from all indexes) But I want to find something like that in a common library, because lots of people seem to find it strikingly normal, that they have to re-implement the same thing over and over again. – Chris Lercher Mar 28 '10 at 19:48
2

Your main goal seems to be that you'll remove the object from all indexes when you remove it from one.

The simplest approach will be to add another layer of indirection: you store your actual object in a Map<Long,Value>, and use a bidirectional map (which you'll find in Jakarta Commons and probably Google Code) for your indexes as Map<Key,Long>. When you remove an entry from a particular index, you'll take the Long value from that index and use it to remove the corresponding entries from the main map and the other indexes.

One alternative to the BIDIMap is to define your "index" maps as Map<Key,WeakReference<Long>>; however, this will require you to implement a ReferenceQueue for cleanup.


Another alternative is to create a key object that can take an arbitrary tuple, define its equals() method to match on any element in the tuple, and use that with a TreeMap. You can't use a HashMap, because you won't be able to compute a hashcode based on just one element of the tuple.

public class MultiKey
implements Comparable<Object>
{
   private Comparable<?>[] _keys;
   private Comparable _matchKey;
   private int _matchPosition;

   /**
    *  This constructor is for inserting values into the map.
    */
   public MultiKey(Comparable<?>... keys)
   {
      // yes, this is making the object dependent on externally-changable
      // data; if you're paranoid, copy the array
      _keys = keys;
   }


   /**
    *  This constructor is for map probes.
    */
   public MultiKey(Comparable key, int position)
   {
      _matchKey = key;
      _matchPosition = position;
   }


   @Override
   public boolean equals(Object obj)
   {
      // verify that obj != null and is castable to MultiKey
      if (_keys != null)
      {
         // check every element
      }
      else
      {
         // check single element
      }
   }


   public int compareTo(Object o)
   {
      // follow same pattern as equals()
   }
}
Anon
  • 345
  • 2
  • 6
2

If you want multiple indexes on your data, you can create and maintain multiple hash maps or use a library like Data Store:

https://github.com/jparams/data-store

Example:

Store<Person> store = new MemoryStore<>() ;
store.add(new Person(1, "Ed", 3));
store.add(new Person(2, "Fred", 7));
store.add(new Person(3, "Freda", 5));
store.index("name", Person::getName);
Person person = store.getFirst("name", "Ed");

With data store you can create case-insensitive indexes and all sorts of cool stuff. Worth checking out.

1

Use Prefuse Tables. They support as many indices as you want, are fast (indices are TreeMaps), and have nice filtering options (boolean filters? no problem!). No database required, tested with large data-sets in many information visualization applications.

In their raw form, they are not as convenient as standard containers (you need to deal with rows and columns), but you can surely write a small wrapper around that. Plus, they plug nicely into UI components such as Swing's JTables.

tucuxi
  • 17,561
  • 2
  • 43
  • 74
0

Basically a solution based on multiple hash maps would be possible, but in this case all of them have to be keped up-to-date manually. A very simple integrated solution can be found here: http://insidecoffe.blogspot.de/2013/04/indexable-hashmap-implementation.html

Jan Wiemer
  • 31
  • 2
0

I'm not sure I understand the question, but I think what you're asking for is multiple ways to map from different, unique keys to values and appropriate clean-up when a value goes away.

I see that you don't want to roll your own, but there's a simple enough composition of map and multimap (I used the Guava multimap below, but the Apache one should work as well) to do what you want. I have a quick and dirty solution below (skipped the constructors, since that depends on what sort of underlying map/multimap you want to use):

package edu.cap10.common.collect;

import java.util.Collection;
import java.util.Map;

import com.google.common.collect.ForwardingMap;
import com.google.common.collect.Multimap;

public class MIndexLookupMap<T> extends ForwardingMap<Object,T>{

    Map<Object,T> delegate;
    Multimap<T,Object> reverse;

    @Override protected Map<Object, T> delegate() { return delegate; }

    @Override public void clear() {
        delegate.clear();
        reverse.clear();
    }

    @Override public boolean containsValue(Object value) { return reverse.containsKey(value); }

    @Override public T put(Object key, T value) {
        if (containsKey(key) && !get(key).equals(value)) reverse.remove(get(key), key); 
        reverse.put(value, key);
        return delegate.put(key, value);
    }

    @Override public void putAll(Map<? extends Object, ? extends T> m) {
        for (Entry<? extends Object,? extends T> e : m.entrySet()) put(e.getKey(),e.getValue());
    }

    public T remove(Object key) {
        T result = delegate.remove(key);
        reverse.remove(result, key);
        return result;
    }

    public void removeValue(T value) {
        for (Object key : reverse.removeAll(value)) delegate.remove(key);
    }

    public Collection<T> values() {
        return reverse.keySet();
    }   

}

removal is O(number of keys), but everything else is the same order as a typical map implementation (some extra constant scaling, since you also have to add things to the reverse).

I just used Object keys (should be fine with appropriate implementations of equals() and hashCode() and key distinction) - but you could also have a more specific type of key.

Carl
  • 7,538
  • 1
  • 40
  • 64
  • note, this does not do multiple values per key, but could straightforwardly enough with forward being a multimap, and adding a get(Object...keys) method which also did some collection intersection operations. – Carl Mar 29 '10 at 21:35
  • This implementation does not support modification via `entrySet()`, `values()` and `keySet()`. These methods offer `clear()` and `iterator().remove()` and Entry's have `setValue(...)`. And i probably missed some more. All these modifications are not considered in the index. – Frederic Leitenberger Feb 10 '17 at 17:38
0

Here is how i am achieving this, right now only put,remove and get methods are working for rest you need to override desired methods.

Example:

MultiKeyMap<MultiKeyMap.Key,String> map = new MultiKeyMap<>();
MultiKeyMap.Key key1 = map.generatePrimaryKey("keyA","keyB","keyC");
MultiKeyMap.Key key2 = map.generatePrimaryKey("keyD","keyE","keyF");

map.put(key1,"This is value 1");
map.put(key2,"This is value 2");

Log.i("MultiKeyMapDebug",map.get("keyA"));
Log.i("MultiKeyMapDebug",map.get("keyB"));
Log.i("MultiKeyMapDebug",map.get("keyC"));

Log.i("MultiKeyMapDebug",""+map.get("keyD"));
Log.i("MultiKeyMapDebug",""+map.get("keyE"));
Log.i("MultiKeyMapDebug",""+map.get("keyF"));

Output:

MultiKeyMapDebug: This is value 1
MultiKeyMapDebug: This is value 1
MultiKeyMapDebug: This is value 1
MultiKeyMapDebug: This is value 2
MultiKeyMapDebug: This is value 2
MultiKeyMapDebug: This is value 2

MultiKeyMap.java:

/**
 * Created by hsn on 11/04/17.
 */


public class MultiKeyMap<K extends MultiKeyMap.Key, V> extends HashMap<MultiKeyMap.Key, V> {

    private Map<String, MultiKeyMap.Key> keyMap = new HashMap<>();

    @Override
    public V get(Object key) {
        return super.get(keyMap.get(key));
    }

    @Override
    public V put(MultiKeyMap.Key key, V value) {
        List<String> keyArray = (List<String>) key;
        for (String keyS : keyArray) {
            keyMap.put(keyS, key);
        }
        return super.put(key, value);
    }

    @Override
    public V remove(Object key) {
        return super.remove(keyMap.get(key));
    }

    public Key generatePrimaryKey(String... keys) {
        Key singleKey = new Key();
        for (String key : keys) {
            singleKey.add(key);
        }
        return singleKey;
    }

    public class Key extends ArrayList<String> {

    }

}
H4SN
  • 1,482
  • 3
  • 24
  • 43
0

lets look at project http://code.google.com/p/multiindexcontainer/wiki/MainPage This is generalized way how to use maps for JavaBean getters and perform lookups over indexed values. I think this is what you are looking for. Lets give it a try.

Fekete Kamosh
  • 361
  • 1
  • 5
  • 18
0

I found this code years ago (can't remember where). Just use this to build your key from different values.

import java.util.Arrays;
public final class MultiKey {
    private static final int PRIME = 31;

    private final Object[] values;

    private final int hashCode;

    /**
     * Creates a new instance from the provided values. It is assumed that the
     * values provided are good map keys themselves -- immutable, with proper
     * implementations of equals() and hashCode().
     *
     * @param values
     */
    public MultiKey(Object... values) {
        this.values = values;
        hashCode = PRIME * Arrays.hashCode(this.values);
    }

    @Override
    public int hashCode() {
        return hashCode;
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        final MultiKey other = (MultiKey) obj;

        return Arrays.equals(values, other.values);
    }

    @Override
    public String toString() {
        StringBuilder builder = new StringBuilder("MultiKey[");

        boolean first = true;

        for (Object o : values) {
            if (!first)
                builder.append(", ");

            builder.append(o);

            first = false;
        }

        builder.append("]");

        return builder.toString();
    }

}
wutzebaer
  • 14,365
  • 19
  • 99
  • 170