I have a two lists of strings like the following:
test1 = ["abc", "abcdef", "abcedfhi"]
test2 = ["The", "silver", "proposes", "the", "blushing", "number", "burst", "explores", "the", "fast", "iron", "impossible"]
The second list is longer, so I want to downsample it to the length of the first list by randomly sampling.
def downsample(data):
min_len = min(len(x) for x in data)
return [random.sample(x, min_len) for x in data]
downsample([list1, list2])
However, I want to add a restriction that the words chosen from the second list must match the length distribution of the first list. So for the first word that is randomly chosen, it must be of the same length as the first word of the shorter list. The issue here is that replacement is not allowed either.
How can I randomly select n (length of shorter list) elements from test2
which matches the character length distribution of test1
?
Thanks,
Jack