Shuffling a dataframe

Question

I have the following Pandas dataframe:

import pandas as pd
timestamps = [pd.Timestamp(2015,1,1), pd.Timestamp(2015,1,3), pd.Timestamp(2015,4,1), pd.Timestamp(2015,11,1)]
quantities = [1, 16, 9, 4]
e_quantities = [1, 4, 3, 2]
data = dict(quantities=quantities, e_quantities=e_quantities)
df = pd.DataFrame(data=data, columns=data.keys(), index=timestamps)

which looks like this:

            quantities  e_quantities
2015-01-01           1             1
2015-01-03          16             4
2015-04-01           9             3
2015-11-01           4             2

I want to shuffle all of the columns except the index one but keeping all the rows matching. I've done this:

import numpy as np
indices_scrambled = np.arange(0, len(timestamps))
np.random.shuffle(indices_scrambled)
df.quantities = df.quantities.values[indices_scrambled]
df.e_quantities = df.e_quantities.values[indices_scrambled]

which works and produces:

            quantities  e_quantities
2015-01-01          16             4
2015-01-03           9             3
2015-04-01           1             1
2015-11-01           4             2

but doesn't extend well if I add lots of columns as I have to keep writing df.column_1 = df.column_1.values[indices_scrambled, df.column_2 = df.column_2.values[indices_scrambled, etc.

Is there a way to scramble all of the columns in the data frame at once, except for the index one?

Thanks for any help here!

scramble the columns means to shuffle the order of the columns at random? like if order was `[1,2,3,4,5]` you would like something like `[2,5,1,4,3]`? — Yuca, Apr 05 '19 at 15:24
@Yuca Exactly -- but keep the index values the same after that — ajrlewis, Apr 05 '19 at 16:08
you have two answers, but this might help you too https://stackoverflow.com/a/51935892/9754169 — Yuca, Apr 05 '19 at 16:13

score 2 · Answer 1 · answered Apr 05 '19 at 15:15

2

this should work for you

from sklearn.utils import shuffle
index = df.index
df = shuffle(df)
df.index = index

answered Apr 05 '19 at 15:15

Ксения Староверова

66
2

score 0 · Answer 2 · answered Apr 05 '19 at 15:31

0

Try below, it uses the same np.random.shuffle() in a loop over columns:

for col in df.columns.to_list():
     np.random.shuffle(df[col])

print(df)

answered Apr 05 '19 at 15:31

hacker315

1,996
2
13
23

Thanks. What I wanted was to keep the the rows matched but their order and keep the index as it was in the begging. Taking your example: `for col in df.columns.to_list(): df[col].values[:] = df[col][indices_scrambled].values` achieves this, but it's quite verbose! – ajrlewis Apr 05 '19 at 16:57

Shuffling a dataframe

2 Answers2