1

I'm a bit puzzled about the behavior of DataFrame

For example:

df = pd.DataFrame(data=np.random.random(size=(5,3)), 
                  columns={'a', 'b', 'c'})

why the default output is:

df =      c         a         b
0  0.325172  0.831253  0.151912
1  0.558476  0.177249  0.906136
2  0.516089  0.069013  0.370251
3  0.440246  0.154116  0.494690
4  0.793981  0.409526  0.885879

and not the ordered list of columns ('a', 'b', 'c')

(Python 3.6, Pandas 0.23)

Arnold Klein
  • 2,956
  • 10
  • 31
  • 60
  • 4
    because you passed a `set` of column names, if you passed a `list` it would preserve the order – EdChum Jun 01 '18 at 11:22

1 Answers1

3

You passed a set of column names which don't necessarily preserve the order, if you passed a list then it would do what you want:

In[32]:
df = pd.DataFrame(data=np.random.random(size=(5,3)), 
                  columns=['a', 'b', 'c'])
df

Out[32]: 
          a         b         c
0  0.227711  0.410568  0.795012
1  0.624751  0.708471  0.152641
2  0.901483  0.967297  0.884749
3  0.353622  0.220706  0.031015
4  0.628634  0.128421  0.679261
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • thanks :) I was sure that the column names should be a set. Need to read more. – Arnold Klein Jun 01 '18 at 11:24
  • the [`DataFrame`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html) ctor `columns` accepts `index` or `array`-like args there really isn't any need to pass a `set` here – EdChum Jun 01 '18 at 11:25
  • 1
    This would be a good use for an OrderedSet construct, but this [doesn't exist](https://stackoverflow.com/questions/1653970/does-python-have-an-ordered-set) yet in Python. – jpp Jun 01 '18 at 11:38