Randomly Select Rows in Pandas DataFrame Vectorized Operation

Question

I want to select a random row during a vector operation on a DataFrame. this is what my inpDF looks like:

    string1    string2
0   abc        dfe
1   ghi        jkl
2   mno        pqr
3   stu        vwx

I'm trying to find the function getRandomRow() here:

outDF['string1'] = inpDF['string1']
outDF['string2'] = inpDF.getRandomRow()['string2']

so that the outDF ends up looking (for example) like this:

    string1    string2
0   abc        jkl
1   ghi        pqr
2   mno        dfe
3   stu        pqr

EDIT 1:

I tried using the sample() function as suggested in this answer, but that just causes the same sample to get replicated accross all rows:

outDF['string1'] = inpDF['string1']
outDF['string2'] = inpDF.sample(n=1).iloc[0,:]['string2']

which gives:

    string1    string2
0   abc        pqr
1   ghi        pqr
2   mno        pqr
3   stu        pqr

EDIT 2:

For my particular use case, even picking the value from 'n' rows down would suffice. So, I tried doing this (I'm using inpDF.index based on what I read in this answer):

numRows = len(inpDF)

outDF['string1'] = inpDF['string1']
outDF['string2'] = inpDF.iloc[(inpDF.index + 2)%numRows,:]['string2']

but it just ends up picking the value from the same row, and the outDF comes out to be this:

    string1    string2
0   abc        dfe
1   ghi        jkl
2   mno        pqr
3   stu        vwx

whereas I'm expecting it should be this:

    string1    string2
0   abc        pqr
1   ghi        vwx
2   mno        dfe
3   stu        jkl

score 1 · Answer 1 · answered Mar 28 '19 at 15:31

1

try np.random.shuffle():

np.random.shuffle(df.string2)
print(df)

  string1 string2
0     abc     pqr
1     ghi     vwx
2     mno     def
3     stu     jkl

If you don't want to shuffle inplace try:

df['string3']=np.random.permutation(df.string2)
print(df)

answered Mar 28 '19 at 15:31

anky

74,114
11
41
70

Nice answer with `np.random.shuffle` +1 – Erfan Mar 28 '19 at 15:36

score 1 · Accepted Answer · answered Mar 28 '19 at 15:36

1

You use pandas.DataFrame.sample for this:

df['string2'] = df.string2.sample(len(df.string2)).to_list()

print(df)
  string1 string2
0     abc     vwx
1     ghi     jkl
2     mno     def
3     stu     pqr

Or

df['string2'] = df.string2.sample(len(df.string2)).values

answered Mar 28 '19 at 15:36

Erfan

40,971
8
66
78

If I don't add the `to_list()` or the `values`, I get the values from that row un-shuffled...why is that? – shinvu Mar 28 '19 at 15:40
Might be a bit confusing in `pandas`, but we can refer to a column with brackets like this: `df['string2']` or with the dot-notation like this `df.string2`. Both are the same. Its what you prefer @shinvu – Erfan Mar 28 '19 at 15:42
Goed question @shinvu I was wondering this myself while answering your question. I just posted this is a new question, you can follow it here: https://stackoverflow.com/questions/55401864/why-does-my-new-column-does-net-get-assigned-after-using-sample-method – Erfan Mar 28 '19 at 15:52

Randomly Select Rows in Pandas DataFrame Vectorized Operation

2 Answers2