1

How can I make a List or similar Collection that represents a subset of another collection, masking (filtering out) unwanted elements but without creating an entirely new collection structure?

The second list could then be more quickly modified by enabling/disabling the visibility of individual elements. Doing so should be more performant than constantly rebuilding a second separate collection structure.

I imagine some kind of bit-map view into the original collection.

Ideally this would be thread-safe, and would fail-fast if the original collection were modified.

For example, two masked collections backed by the same master collection:

  • The master collection might be [ dog , cockatiel , cat , lion , wolf , hummingbird ].
  • A masked collection might be named canine containing [ dog , wolf ] with no references to the other elements.
  • Another masked collection might be name feline containing [ cat , lion ].

Another example: We have a list of many LocalDate objects. The user selects some of those dates for some purpose, perhaps selecting only weekdays but not weekends. Then the user changes their selection, making a manual exception for certain dates. Rather than build a new list each time of selected elements, a masked collection would be tracking which ones are selected with the others being ignored by omission.

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
  • 1
    If you're intending to filter based on characteristics of each element, see http://stackoverflow.com/questions/8458663/guava-why-is-there-no-lists-filter-function for lots of reasons why this approach is *less* performant. If you just want to filter based on index in the original backing list, you should be explicit about that. – Matt McHenry Aug 27 '16 at 02:13
  • 2
    There's no way to give a meaningful answer with the info you've provided. The drill for data structure design is _always_ the same. Specify 1) the exact operations that need to be done (insert, delete, query, etc.) 2) their relative frequency, 3) importance of space vs. time and any other tradeoffs. Only then can you consider alternatives. Don't start - as you have - with preconceptions about the best solution. Often, as @MattMcHenry has pointed out, a tricky solution that looks like it ought to be efficient is beaten badly by a simpler one that doesn't. – Gene Aug 27 '16 at 02:36
  • What are you planning on doing with the filtered list? There is a good chance that a filtered stream could be used. – bradimus Aug 27 '16 at 02:48
  • I might be wrong, so please don't blame me. access time would be the same since you are not creating a new structure you will have to filter the collection what would be more efficient is to classify the content at insertion time (could be HashMap with String mask as key and a List as value containing your elements) access then would be faster. but this depends on the nature of data you are treating whether separable or not – whyn0t Aug 27 '16 at 02:52
  • So you want to store all the information in the list, but when iterating it only look at certain items in the list that you want? You could just save off some predicates to the side and use `myList.stream().filter(predicateForFirstSubList).forEach(...` when you want to iterate over specific elements. But I am really not sure how this would perform any better compared to just using two or more lists. – NESPowerGlove Aug 27 '16 at 02:59
  • 3
    You may want to ask yourself whether you are doing [premature optimization](http://c2.com/cgi/wiki?PrematureOptimization). If a more straightforward design performs acceptably, you will not need to complicate matters. – Ole V.V. Aug 27 '16 at 03:24
  • @OleV.V. THIS. Don't waste time with fancy optimizations until you know you need them. You won't know you need them here until your application proves to perform poorly and your profiling indicates that rebuilding the filte list is a bottleneck. – Jim Garrison Aug 27 '16 at 03:40

1 Answers1

0

I would use filter categories using mask bits. Each different type of element would have different mask bits:

enum CategoryBits {
    CATEGORY_A = 0x0001;
    CATEGORY_B = 0x0002; 
    CATEGORY_C = 0x0004;
}

Then for define your mask bits on which categories you want to include:

enum MaskBits {
    filter1 = CATEGORY_A; //gets only CATEGORY_A items
    filter2 = CATEGORY_A | CATEGORY_B; //gets CATEGORY_A and CATEGORY_B items
}

So every item in your collection would have a function:

CategoryBits categoryBits() const { return CATEGORY_A; } //return whatever category this item is

Depending on which items you want to filter you can then set the MASK_BITS for the container. So now for filtering you just have to do:

if(MASK_BITS & item.category() != 0) { //this item is in the categories you want }

Which is very fast.

macco
  • 469
  • 4
  • 12
  • The question is tagged java. – bradimus Aug 27 '16 at 02:44
  • @bradimus Sorry didn't see the tag. The concept is exactly the same in java though. Using bitmasks is fast. – macco Aug 27 '16 at 03:06
  • The answer is fails the most basic OOP concepts. You want to force the objects to implement a functionality just so a list can filter them. Why should those objects even be aware that they will be in a list or filtered? What if you don't own the code for the objects in the list? – bradimus Aug 27 '16 at 03:24
  • Suppose I have a `List`. How do I implement your solution? – bradimus Aug 27 '16 at 03:26
  • You can wrap the objects if you don't own the code. Yes this is a trade-off. You get more efficient code at the expense of it being less Object Oriented. – macco Aug 27 '16 at 03:28
  • More efficient than what? – bradimus Aug 27 '16 at 03:34
  • If you want to setup different filters and filter objects by category, then bit mask filtering is about the most efficient you can get. (You only have to do a bitwise and, and a compare against zero) But on re-reading Basil's question, I'm not entirely sure if thats what he wants. He will have to clarify. – macco Aug 27 '16 at 03:55