Pandas - How do you randomize the rows of a dataframe

Question

I am trying to randomize my rows by keeping the rows the same across but mixing up the order of the rows to in turn randomize the dependentVariable. I have the following dataframe:

df
   columnOne columnTwo dependentVariable
0  TAG       321511    0
1  ID        1111      0
2  ID        2222      0
3  system    1         0
4  TAG       252524    0
5  ID        3333      0
6  ID        4444      0
7  ID        5555      1
8  ID        6666      1 
9  TAG       343536    1
10 Local     22        1 
11 ID        7777      1

And randomize the rows:

df
   columnOne columnTwo dependentVariable
0  TAG       321511    0
8  ID        6666      1
1  ID        1111      0
2  ID        2222      0
9  TAG       343536    1
3  system    1         0
10 Local     22        1
4  TAG       252524    0
11 ID        7777      1
5  ID        3333      0
6  ID        4444      0
7  ID        5555      1

Then do a reset index like

 df = df.reset_index(drop=True)

Desired output:

df
   columnOne columnTwo dependentVariable
0  TAG       321511    0
1  ID        6666      1
2  ID        1111      0
3  ID        2222      0
4  TAG       343536    1
5  system    1         0
6  Local     22        1
7  TAG       252524    0
8  ID        7777      1
9  ID        3333      0
10 ID        4444      0
11 ID        5555      1

score 1 · Accepted Answer · edited Jul 04 '19 at 10:12

1

You can shuffle the index if it is a number:

df = pd.DataFrame(['A','B','C','D','E','F','G','H','I','j'],columns = ['Data'])

arr = np.arange(len(df))
out = np.random.permutation(arr) # random shuffle

df.ix[out]

edited Jul 04 '19 at 10:12

answered Aug 14 '18 at 15:45

flyingmeatball

7,457
7
44
62

Pandas - How do you randomize the rows of a dataframe

1 Answers1