3

Given a list of 256 numbers in order (0-255), I want to express a subset of 128 numbers from that list. Every number will be unique and not repeated.

What is the most compact way to express this subset?

What I've come up with so far is having a 256 length bit-array and setting the appropriate indexes to 1. This method obviously requires 256 bits to represent the 128 values but is there a different, more space-saving way?

Thanks!

Kozzy
  • 37
  • 4
  • Is there a relation between the selected numbers? – xtofl Apr 03 '17 at 12:30
  • @xtofl - no there is no relation besides that the 128 numbers are elements from the 256 number list – Kozzy Apr 03 '17 at 12:32
  • Are you trying to express an arbitrary subset or is there a method you will use to determine the indices? – Abion47 Apr 03 '17 at 12:33
  • 256 bits is 32 bytes which is not bad. – jdweng Apr 03 '17 at 12:34
  • In theory, the fact that exactly half of the numbers is selected, could save another bit... – xtofl Apr 03 '17 at 12:35
  • @Abion47 - The subset will be arbitrary and random but will always contain 128 unique elements from the 256 number list – Kozzy Apr 03 '17 at 12:36
  • @jdweng - 32 bytes is not bad, but can we go smaller!? – Kozzy Apr 03 '17 at 12:42
  • Do you need to be able to place the numbers in the subset back into their original positions in the larger set? – Abion47 Apr 03 '17 at 12:48
  • @Abion47 - No, the subset's order does not matter and will be un-related to the larger set – Kozzy Apr 03 '17 at 12:51
  • Then just use an array of half the size. Can't get much more compact than that. – Abion47 Apr 03 '17 at 12:54
  • You can't get smaller than 32 bytes (8 * 32 = 256) for selecting an array of 256 items. Even though you only want to select 128 of the items. – jdweng Apr 03 '17 at 13:00
  • And how many space you are expecting to save? Say if you can save 4 bits - would that help in the problem you are solving? – Evk Apr 03 '17 at 13:15
  • @Evk - Ideally i would like to use 128 bits but i believe that is just a pipe dream – Kozzy Apr 03 '17 at 13:17

2 Answers2

0

Since you don't care about the order of the subset nor do you care about restoring each element to its position in the original array, this is simply a case of producing a random subset of an array, which is similar to drawing cards from a deck.

To take unique elements from an array, you can simply shuffle the source array and then take a number of elements at the first X indices:

int[] srcArray = Enumerable.Range(0, 256).ToArray();

Random r = new Random();
var subset = srcArray.OrderBy(i => r.Next()).Take(128).ToArray();

Note: I use the above randomizing method to keep the example concise. For a more robust shuffling approach, I recommend the Fisher-Yates algorithm as described in this post.

Community
  • 1
  • 1
Abion47
  • 22,211
  • 4
  • 65
  • 88
  • I'm trying to represent the subset in less than 256 bits though. This would create an array that can't be expressed that way! Appreciate the effort though – Kozzy Apr 03 '17 at 13:06
  • @Kozzy Well what you're trying to do can't be done, so this is the way to do it that would produce much the same effect. The only thing I could think of would be to shuffle the source array and then arbitrarily define the subset as being the first X elements of the array, but that wouldn't be a separate representation of the subset so I wouldn't say that counts. – Abion47 Apr 03 '17 at 13:11
  • I was afraid that was the answer. It was worth a shot to see if I missed something simple. I appreciate your time and help! – Kozzy Apr 03 '17 at 13:17
0

There is 256! / (128! * (256 - 128)!) unique combinations of 128 elements from a set of 256 items, when order does not matter (see wiki about combinations).

If you calculate that number and take base-2 logarithm - you will find that it's 251.6. That means you need at least 252 bits to represent unique selection of 128 items out of 256. Since .NET anyway cannot represent bits (only whole bytes) - there is no reason to actually find out how this could be done.

128 is the worst number in that regard. If you were selecting say 5 elements or 251 out of 256 - that could have been represented with 34 bits and it would have been useful to try and find that kind of effective representation.

Evk
  • 98,527
  • 8
  • 141
  • 191
  • What if the subset contains 64 elements from the 256 number list? Does that allow for a more compressed representation? 64 bits or less perhaps? – Kozzy Apr 03 '17 at 13:50
  • For 64 that would still be 204 bits (so 26 whole bytes), so you can save 6 bytes, in theory. – Evk Apr 03 '17 at 13:57