I have the following data frame of the form:
1 2 3 4 5 6 7 8
A C C T G A T C
C A G T T A D N
Y F V H Q A F D
I need to randomly select a column k times where k is the number of columns in the given sample. My program creates a list of empty lists of size k and then randomly selects a column from the dataframe to be appended to the list. Each list must be unique and cannot have duplicates.
From the above example dataframe, an expected output should be something like:
[[2][4][6][1][7][3][5][8]]
However I am obtaining results like:
[[1][1][3][6][7][8][8][2]]
What is the most pythonic way to go about doing this? Here is my sorry attempt:
k = len(df.columns)
k_clusters = [[] for i in range(k)]
for i in range(len(k_clusters)):
for j in range(i + 1, len(k_clusters)):
k_clusters[i].append((df.sample(1, axis=1)))
if k_clusters[i] == k_clusters[j]:
k_clusters[j].pop(0)
k_clusters[j].append(df.sample(1, axis=1)