2

I have the following dataframe:

df = pd.DataFrame({'A':range(10), 'B':range(10), 'C':range(10), 'D':range(10)})

I would like to shuffle the data using the below function:

import pandas as pd

import numpy as np

def shuffle(df, n=1, axis=0):
    df = df.copy()
    for _ in range(n):
        df.apply(np.random.shuffle, axis=axis)
        return df

However I do not want to shuffle columns A and D, only columns B and C. Is there a way to do this by amending the function? I want to say if column == 'A' or 'D' then don't shuffle.

Thanks

Mrmoleje
  • 453
  • 1
  • 12
  • 35

1 Answers1

0

You could shuffle the required columns as below:

import numpy as np
import pandas as pd

# the data 
df = pd.DataFrame({'A':range(10), 'B':range(10), 
     'C':range(10), 'D':range(10)}) 

# shuffle 
df.B = np.random.permutation(df.B)
df.C =  np.random.permutation(df.C) 

# or shuffle this way (in place)
np.random.shuffle(df.B)
np.random.shuffle(df.C)

If you need to shuffle using your shuffle function:

def shuffle(df, n=1):

   for _ in range(n):
        # shuffle B
        np.random.shuffle(df.B)
        # shuffle C
        np.random.shuffle(df.C)
        print(df.B,df.C)   # comment this out as needed

    return df

You do not need to disturb columns A and D.

smile
  • 574
  • 5
  • 11