Python get all comination of sub-sampling from list

Question

I have two list: A with the size of 49 and B with the size of 68.

I want to iterate over all possible subsamples in the size of 32 in the lists.

What is the best way to do it in python?

Read [this](https://stackoverflow.com/questions/374626/how-can-i-find-all-the-subsets-of-a-set-with-exactly-n-elements) — GalAbra, Jun 23 '19 at 07:35
You have 49 elements in A, 68 in B, and want every possible sample? Like a0b0, a0b1, ..., a0b67, a1b0 and so on? And what about `32` anyway? — crissal, Jun 23 '19 at 07:35
@crissal No, it should be not repititions. a0b0, a1b5, a2b1, a9b6, ... — Cranjis, Jun 23 '19 at 07:40
I hope you've plenty of time, it's going to take a while to loop over all that — 6502, Jun 23 '19 at 07:41
So you want N tuples made with 32 elements from A (first position) and 32 elements from B (second position) without repetitions? — crissal, Jun 23 '19 at 07:44

yaho cho · Answer 1 · 2019-06-23T07:42:45.373

0

You can use the combinations method of itertools. Just you can change items data and the number of count in combinations method from 2 to 32. Please refer the following sample code:

from itertools import combinations

items_a = ['a', 'b', 'c']
items_b = ['1', '2', '3', '4', '5']
items = items_a + items_b

print(list(combinations(items_a, 2)))
print(list(combinations(items_b, 2)))

The result.

[('a', 'b'), ('a', 'c'), ('b', 'c')]
[('1', '2'), ('1', '3'), ('1', '4'), ('1', '5'), ('2', '3'), ('2', '4'), ('2', '5'), ('3', '4'), ('3', '5'), ('4', '5')]

edited Jun 23 '19 at 07:42

answered Jun 23 '19 at 07:36

yaho cho

1,779
1
7
19

please notice that the output should be 32 elements from list A and 32 elements from list B – Cranjis Jun 23 '19 at 07:39
@what do you mean? – Cranjis Jun 23 '19 at 07:43
@okuoub Please provide your sample data. And Please share your code snippet if you have. – yaho cho Jun 23 '19 at 08:00

blackening · Answer 2 · 2019-06-23T08:07:52.463

You don't want to do that.

You have two lists of size 48 and 68. You want size 32 subsets of each.

There are 2 e 12 and 2 e 19 subsets of each. Multiplying then you get 4 e 31. That's a very big number.

So here's a rule of thumb. 10 ^ 9 is roughly the number of operations per second. 10 ^ 16 is 1 year... You don't want to think about ^ 31.

Let's put it this way. The hashrate of the entire bitcoin network is about 70, 000, 000 trillion hashes per second. Or 2.2 e 27 per year. That's still short a few zeros.

Change your algorithm.

Python get all comination of sub-sampling from list

2 Answers2