Python : get random ten values from a pandas dataframe

Question

I am trying to build an algorithm for finding number of clusters. I need to assign random points from a data set as initial means.

I first tried the following code :

mu=random.sample(df,10)

it gave index out of range error.

I converted the same into a numpy array and then did

mu=random.sample(np.array(df).tolist(),10)

instead of giving 10 values as mean it is giving me 10 arrays of values.

How can I get a 10 values to initialise as mean for 10 clusters from the dataframe?

score 4 · Answer 1 · answered Jan 17 '17 at 07:09

I think you need DataFrame.sample:

mu = df.sample(10)

Sample:

np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(20,3)), columns=list('abc'))
print (df)
    a  b  c
0   8  8  3
1   7  7  0
2   4  2  5
3   2  2  2
4   1  0  8
5   4  0  9
6   6  2  4
7   1  5  3
8   4  4  3
9   7  1  1
10  7  7  0
11  2  9  9
12  3  2  5
13  8  1  0
14  7  6  2
15  0  8  2
16  5  1  8
17  1  5  4
18  2  8  3
19  5  0  9

mu = df.sample(10)
print (mu)
    a  b  c
11  2  9  9
1   7  7  0
8   4  4  3
5   4  0  9
2   4  2  5
19  5  0  9
13  8  1  0
14  7  6  2
0   8  8  3
9   7  1  1

score 4 · Accepted Answer · answered Jan 17 '17 at 08:07

Use numpy.random.choice

df.iloc[np.random.choice(np.arange(len(df)), 10, False)]

Or numpy.random.permutation

df.loc[np.random.permutation(df.index)[:10]]

    a  b  c
11  2  9  9
1   7  7  0
16  5  1  8
15  0  8  2
17  1  5  4
19  5  0  9
10  7  7  0
8   4  4  3
6   6  2  4
14  7  6  2

Python : get random ten values from a pandas dataframe

2 Answers2