Creating sub-lists

Question

Let's say I have a list A of size 285. The first sub-list must have elements of A with size 228 (80% of 285). The second, of size 10% of A. The third, of size 10% of A. There should not be any common element at all. The entire process is randomized.

I'm aware of random.choices() and random.sample() but I en dup having common elements.

How about just shuffling the list using numpy with `np.random.shuffle()` and then subdividing it into the three parts of different length? This would avoid having duplicates. — johannesack, Mar 01 '20 at 14:04

score 1 · Answer 1 · answered Mar 01 '20 at 14:02

1

Depending on the type of elements you can kind of put them in a hash map with a hashing algo of what you define.

Next iterate through the keys, and try to put them in your required sublists based on the count.

answered Mar 01 '20 at 14:02

Madhu Avinash

933
2
8
27

DarrylG · Answer 2 · 2020-03-01T14:41:25.430

1

We can use a technique commonly used in machine learning to partition data into training and test datasets.

Steps are:

Use random.shuffle to create a random ordering of data
Partition the shuffled data based upon sized of desired sublists

Code

import random

def partion_list(a):
  """Partiion list into sublists with 80%/10%/10% splits"""
  # Shallow copy of input list
  b = A[:] #shallow copy
  random.shuffle(b)  # inplace shuffle
  n = len(b)

  # Split with no common elements, but covers all the elements
  a1 = b[:int(0.8*n)]
  a2 = b[int(0.8*n):int(0.9*n)]
  a3 = b[int(0.9*n):]

  return a1, a2, a3

Test Code

A = list(range(285)) # test using list of numbers from 0 to 284
a1, a2, a3 = partion_list(A)

print('a1:', len(a1))
print('a2:', len(a2))
print('a3:', len(a3))

Output

a1: 228
a2: 28
a3: 29

edited Mar 01 '20 at 14:41

answered Mar 01 '20 at 14:16

DarrylG

16,732
2
17
23

This is exactly what I did after asking the question. Thanks, anyway. Do you have any idea how to do this without slicing? – Sarthak Maharana Mar 02 '20 at 16:38
@SarthakMaharana--another method is to use routine `train_test_split` twice from sklearn as descrbied by blitu12345 answer [How to split data into 3 sets -train, validation and test](https://stackoverflow.com/questions/38250710/how-to-split-data-into-3-sets-train-validation-and-test). – DarrylG Mar 02 '20 at 17:21
I wasn't searching for that anyway but thanks anyway :) – Sarthak Maharana Apr 06 '20 at 13:25

score 1 · Answer 3 · answered Mar 01 '20 at 14:17

If the order doesn't matter, it's simple: random.shuffle the entire list, and then take slices of the needed sizes.

If you need to pick out some elements and keep them in order, it gets trickier. The best I can think of is to just go through it mechanically: use random.sample to get the indices of the elements you want for the first sub-list; make that list; then remove those index positions and repeat for more sub-lists. To separate out the elements cleanly and avoid logic errors, we can use list comprehensions to build the sub-list as well as the new "remaining" pool. If you're using numpy, this can probably be done better with masks.

Creating sub-lists

3 Answers3