2
       A            B            C               D              E
0   165349.20   136897.80    471784.10        New York      192261.83
1   162597.70   151377.59    443898.53        California    191792.06
2   153441.51   101145.55    407934.54        Florida       191050.39
3   144372.41   118671.85    383199.62        New York      182901.99
4   142107.34   91391.77     366168.42        Florida       166187.94

after using df = pd.get_dummies(df, columns=['D'])

        A            B              C           E      D_New York    D_California     D_Florida
0   165349.20    136897.80      471784.10   192261.83      0             0                1
1   162597.70    151377.59      443898.53   191792.06      1             0                0
2   153441.51    101145.55      407934.54   191050.39      0             1                0
3   144372.41    118671.85      383199.62   182901.99      0             0                1
4   142107.34    91391.77       366168.42   166187.94      0             1                0

is there a way where the output looks like this to without using df[['A','B','C','D_Califorina','D_New York','D_Florida','E']]?

        A            B          C      D_New York    D_California     D_Florida     E
0   165349.20   136897.80   471784.10       0               0          1    192261.83
1   162597.70   151377.59   443898.53       1               0          0    191792.06
2   153441.51   101145.55   407934.54       0               1          0    191050.39
3   144372.41   118671.85   383199.62       0               0          1    182901.99
4   142107.34   91391.77    366168.42       0               1          0    166187.94
Zale Goldart
  • 49
  • 1
  • 1
  • 5

3 Answers3

2

By using sort_index

df.sort_index(axis=1)
Out[813]: 
           A          B          C  D_California  D_Florida  D_NewYork  \
0  165349.20  136897.80  471784.10             0          0          1   
1  162597.70  151377.59  443898.53             1          0          0   
2  153441.51  101145.55  407934.54             0          1          0   
3  144372.41  118671.85  383199.62             0          0          1   
4  142107.34   91391.77  366168.42             0          1          0   
           E  
0  192261.83  
1  191792.06  
2  191050.39  
3  182901.99  
4  166187.94  

Edit:..... list sort with dict and lambda

A=dict(zip(df.columns,list(range(0,df.shape[1]))))
#build a dict A store the order of original df
df1=pd.get_dummies(df, columns=['State'])
#get your df
youroder=list(df1)
#new disorder column name
youroder.sort(key=lambda val: A[val.split(sep='_')[0]])
# sort it 
df1[youroder]

Out[842]: 
   R&D Spend  Administration  Marketing Spend  State_California  \
0  165349.20       136897.80        471784.10                 0   
1  162597.70       151377.59        443898.53                 1   
2  153441.51       101145.55        407934.54                 0   
3  144372.41       118671.85        383199.62                 0   
4  142107.34        91391.77        366168.42                 0   
   State_Florida  State_NewYork  Profit(E)  
0              0              1  192261.83  
1              0              0  191792.06  
2              1              0  191050.39  
3              0              1  182901.99  
4              1              0  166187.94  
BENY
  • 317,841
  • 20
  • 164
  • 234
  • assuming the column names are not alphabetical like in my sample, are there other ways? – Zale Goldart Oct 27 '17 at 03:51
  • These are the original column names respectively: R&D Spend, Administration, Marketing Spend, State, Profit(E). I wanted to arrange them to: R&D Spend, Administration, Marketing Spend, State_California, State_New York, State_Florida, Profit(E) – Zale Goldart Oct 27 '17 at 03:54
  • @ZaleGoldart all I can think is split the original df, and concat them back – BENY Oct 27 '17 at 04:04
2

Generalized solution for columns that may not be in sorted order:
Find location of column to dummify and concat accordingly

j = df.columns.get_loc('D')

left = df.iloc[:, :j]
dumb = pd.get_dummies(df[['D']])
rite = df.iloc[:, j+1:]

pd.concat([left, dumb, rite], axis=1)

           A          B          C  D_California  D_Florida  D_New York          E
0  165349.20  136897.80  471784.10             0          0           1  192261.83
1  162597.70  151377.59  443898.53             1          0           0  191792.06
2  153441.51  101145.55  407934.54             0          1           0  191050.39
3  144372.41  118671.85  383199.62             0          0           1  182901.99
4  142107.34   91391.77  366168.42             0          1           0  166187.94
piRSquared
  • 285,575
  • 57
  • 475
  • 624
1

Not sure if there is a better way but this will work

col = ['R&D Spend', 'Administration', 'Marketing Spend', 'State_California', 'State_New York', 'State_Florida', 'Profit(E)']

df=df.loc[:, col]
Vaishali
  • 37,545
  • 5
  • 58
  • 86