0

The following is some code to generate a sample dataframe:

fruits=pd.DataFrame()
fruits['month']=['jan','feb','feb','march','jan','april','april','june','march','march','june','april']
fruits['fruit']=['apple','orange','pear','orange','apple','pear','cherry','pear','orange','cherry','apple','cherry']
ind=fruits.index
ind_mnth=fruits['month'].values
fruits['price']=[30,20,40,25,30 ,45,60,45,25,55,37,60]
fruits_grp = fruits.set_index([ind_mnth, ind],drop=False)

How can I shuffle the outer index randomly and inner index in a different random order in this multi-index data frame?

ifly6
  • 5,003
  • 2
  • 24
  • 47
  • You actually want to remove the association between the inner and outer indexes? – Henry Ecker Jul 22 '21 at 14:23
  • No, I wish to perform a 2 level shuffle. First shuffle the outer index(months) and then shuffle the inner index amongst the same outer index(months). – Medha Chippa Jul 22 '21 at 14:31
  • Please refer to a similar task being performed, https://stackoverflow.com/questions/55054185/outer-index-to-ascending-inner-index-to-descending-in-multi-index-pandas, the difference being that I would like to shuffle my both the outer index and inner index in a random order. – Medha Chippa Jul 22 '21 at 14:36
  • You can just sample the dataframe `df.sample(frac=1)` – Henry Ecker Jul 22 '21 at 14:44
  • Does this answer your question? [Shuffle DataFrame rows](https://stackoverflow.com/questions/29576430/shuffle-dataframe-rows) – Henry Ecker Jul 22 '21 at 14:44

1 Answers1

0

Assuming this dataframe with MultiIndex as input:

          month   fruit  price
jan   0     jan   apple     30
feb   1     feb  orange     20
      2     feb    pear     40
march 3   march  orange     25
jan   4     jan   apple     30
april 5   april    pear     45
      6   april  cherry     60
june  7    june    pear     45
march 8   march  orange     25
      9   march  cherry     55
june  10   june   apple     37
april 11  april  cherry     60

First shuffle the whole DataFrame, then regroup the months by indexing on a random order:

np.random.seed(0)
idx0 = np.unique(fruits_grp.index.get_level_values(0))
np.random.shuffle(idx0)
fruits_grp.sample(frac=1).loc[idx0]

output:

          month   fruit  price
jan   0     jan   apple     30
      4     jan   apple     30
april 6   april  cherry     60
      5   april    pear     45
      11  april  cherry     60
feb   1     feb  orange     20
      2     feb    pear     40
june  10   june   apple     37
      7    june    pear     45
march 8   march  orange     25
      9   march  cherry     55
      3   march  orange     25
mozway
  • 194,879
  • 13
  • 39
  • 75