6

Is there an algorithm (preferably constant time) to check if set A is a subset of set B?

Creating the data structures to facilitate this problem does not count against the runtime.

volni
  • 5,196
  • 8
  • 38
  • 44
  • 1
    Found this answer: http://stackoverflow.com/a/1338515/174674 – volni Oct 05 '12 at 23:47
  • 1
    We need more informations about the set content. General algorithms won't give you a constant time complexity. At least, none I know of. – Samy Arous Oct 05 '12 at 23:51
  • The set elements are strings but of course we can run them through some hash or assign them positions in a bitset if that would yield a faster algorithm. – volni Oct 06 '12 at 00:08
  • If you have more information about the strings specifically it might be easier to exploit some fact about them. – argentage Oct 06 '12 at 01:22

3 Answers3

1

Well, you're going to have to look at each element of A, so it must be at least linear time in the size of A.

An O(A+B) algorithm is easy using hashtables (store elements of B in a hashtable, then look up each element of A). I don't think you can do any better unless you know some advance structure for B. For instance, if B is stored in sorted order, you can do O(A log B) using binary search.

Keith Randall
  • 22,985
  • 2
  • 35
  • 54
  • If you sort both sets them you can compare the head item of the two collections. The performance of this algorithm is O(A + B) – Miguel Sep 07 '13 at 03:07
0

You might go for bloom filter (http://en.wikipedia.org/wiki/Bloom_filter ). However there might be false positives, which can be addressed by the method mentioned by Keith above (but note that the worst case complexity of hashing is NOT O(n), but you can do O(nlogn).

  1. See if A is a subset of B according to Bloom filter
  2. If yes, then do a thorough check
ElKamina
  • 7,747
  • 28
  • 43
  • I like this algorithm because doing some post processing is very fast in my case. The bloom filter would run on the server and post processing the result set would run client side. – volni Oct 06 '12 at 00:09
0

If you have a list of the least common letters and pairs of letters in your string set, you can store your sets sorted with their least common letters and letter pairs and maximize your chances of tossing out negative matches as quickly as possible. It's not clear to me how well this would combine with a bloom filter, Probably a hash table will do since there aren't very many digrams and letters.

If you had some information about the maximum size of subsets or even a common size you could similarly preproccess data by putting all of the subsets of a given size into a bloom filter as mentioned.

You could also do a combination of both of these.

argentage
  • 2,758
  • 1
  • 19
  • 28