3

I have the following Pandas dataframe:

import pandas as pd
timestamps = [pd.Timestamp(2015,1,1), pd.Timestamp(2015,1,3), pd.Timestamp(2015,4,1), pd.Timestamp(2015,11,1)]
quantities = [1, 16, 9, 4]
e_quantities = [1, 4, 3, 2]
data = dict(quantities=quantities, e_quantities=e_quantities)
df = pd.DataFrame(data=data, columns=data.keys(), index=timestamps)

which looks like this:

            quantities  e_quantities
2015-01-01           1             1
2015-01-03          16             4
2015-04-01           9             3
2015-11-01           4             2

I want to shuffle all of the columns except the index one but keeping all the rows matching. I've done this:

import numpy as np
indices_scrambled = np.arange(0, len(timestamps))
np.random.shuffle(indices_scrambled)
df.quantities = df.quantities.values[indices_scrambled]
df.e_quantities = df.e_quantities.values[indices_scrambled]

which works and produces:

            quantities  e_quantities
2015-01-01          16             4
2015-01-03           9             3
2015-04-01           1             1
2015-11-01           4             2

but doesn't extend well if I add lots of columns as I have to keep writing df.column_1 = df.column_1.values[indices_scrambled, df.column_2 = df.column_2.values[indices_scrambled, etc.

Is there a way to scramble all of the columns in the data frame at once, except for the index one?

Thanks for any help here!

ajrlewis
  • 2,968
  • 3
  • 33
  • 67
  • scramble the columns means to shuffle the order of the columns at random? like if order was `[1,2,3,4,5]` you would like something like `[2,5,1,4,3]`? – Yuca Apr 05 '19 at 15:24
  • @Yuca Exactly -- but keep the index values the same after that – ajrlewis Apr 05 '19 at 16:08
  • you have two answers, but this might help you too https://stackoverflow.com/a/51935892/9754169 – Yuca Apr 05 '19 at 16:13

2 Answers2

2

this should work for you

from sklearn.utils import shuffle
index = df.index
df = shuffle(df)
df.index = index
0

Try below, it uses the same np.random.shuffle() in a loop over columns:

for col in df.columns.to_list():
     np.random.shuffle(df[col])

print(df)
hacker315
  • 1,996
  • 2
  • 13
  • 23
  • Thanks. What I wanted was to keep the the rows matched but their order and keep the index as it was in the begging. Taking your example: `for col in df.columns.to_list(): df[col].values[:] = df[col][indices_scrambled].values` achieves this, but it's quite verbose! – ajrlewis Apr 05 '19 at 16:57