5

Say I have the following classes:

public class Tagged {

    private List<String> tags;
}

public class ContainerOfTagged {

    private List<Tagged> tagged;
}

With this structure, whenever I need to find a Tagged with a specific tag, I need to iterate over all the tagged in ContainerOfTagged, and iterating over all tags of each Tagged. That could affect performance depending on the size of the lists.

A simple solution would be changing the ContainerOfTagged class to use a Map, mapping tags in lists of Tagged:

public class ContainerOfTagged {

    private Map<String, List<Tagged>> tagMapping;
}

Now all I need to do is provide a tag, and the Map will return all Tagged with said tag. However, by doing this I'm causing data duplication, since the same tags exist in both the Tagged and ContainerOfTagged classes.

So, is there a way to solve this problem with a performatic solution that doesn't duplicate data?

Lii
  • 11,553
  • 8
  • 64
  • 88
Flumen
  • 169
  • 2
  • 12
  • Do you have so much data in the `ContainerOfTagged` that the memory usage is really an issue? – Andy Turner Aug 28 '16 at 16:28
  • No, you can't avoid the duplication if you don't want to iterate. – Jimmy T. Aug 28 '16 at 16:35
  • Is an array an option ? Or can Tagged be an enum ? Also if the nunmber of tags is reasonable you can have a ContainerOfTagged for each tag. – c0der Aug 28 '16 at 16:43
  • @AndyTurner Memory usage is not a issue at the time. The current issue is performance when finding all Tagged with a specific tag. – Flumen Aug 28 '16 at 16:44
  • @Flumen then just duplicate the data. – Andy Turner Aug 28 '16 at 16:45
  • @c0der Sorry, but I fail to see how arrays or enums would help here. The ideia is that these classes would be part of project A, and other projects (which will depend on project A) would define the tags, so I can't know them beforehand. – Flumen Aug 28 '16 at 16:51
  • @AndyTurner The problem with data duplication is that whenever I add/remove a tag in one place, I'll have to do so in the other place. In the long run, it will most likely cause many bugs. – Flumen Aug 28 '16 at 16:56
  • If the number of tags is reasonable you can have a ContainerOfTagged for each tag which may make the search shorter. It will add some data but will not duplicate data. – c0der Aug 28 '16 at 17:10
  • 1
    There are solutions for this, and this questions has been raised before on this site with different formulations. Other questions describe this as bi-multimaps, or as N:M relationships (using database terminology). Examples: http://stackoverflow.com/questions/8066109/bidirectional-multi-valued-map-in-java and http://stackoverflow.com/questions/3926642/how-to-implement-nm-relation-in-jva – Lii Aug 28 '16 at 19:12

1 Answers1

2

You can't really avoid "duplicating" the tags, but remember that you are not really duplicating them as the Lists and Maps only store references to the tag string, not the values (however, the references are likely to take up quite a lot of space in themselves).

The problem is that you need two indexes:

  1. You need to find the list of tags, given the Tagged object.
  2. You need to find the Tagged object, given a tag.

Ideally, your solution would look like this.You can solve your concerns about things getting out-of-sync by having a single method to manage tagging.

Note that in Tagged you should use a Set instead of a list to avoid duplication of tags.

public class Tagged {
    Set<String> tags;
}

public class TagContainer {
    Map<String, Tagged> tagIndex;

    public tag(String tag, Tagged tagged) {
        tagged.tags.add(tag);
        tagIndex.put(tag, tagged);
    }

If memory utilisation is a major concern you could try some kind of reference compression. Using this technique, you could store your tags in a array and then refer to them by index. If you had few enough, you could use a byte or short instead of a reference, but the code would be a lot messier and I would not recommend it.

EDIT:

In my first post, I proposed that Tagged should be an interface called Tagable. This is cleaner, but lengthens the solution, so I reverted to a class. Howevever, you could perhaps consider having a Tagable interface and implement this in the Tagged class.

public interface Tagable {
    Set<String> getTags;
    tag(String tag);
}
rghome
  • 8,529
  • 8
  • 43
  • 62