15

I am creating an initial pandas dataframe to store results generated from other codes: e.g.

result = pd.DataFrame({'date': datelist, 'total': [0]*len(datelist), 
                       'TT': [0]*len(datelist)})

with datelist a predefined list. Then other codes will output some number for total and TT for each date, which I will store in the result dataframe.

So I want the first column to be date, second total and third TT. However, pandas will automatically reorder it alphabetically to TT, date, total at creation. While I can manually reorder this again afterwards, I wonder if there is an easier way to achieve this in one step.

I figured I can also do

result = pd.DataFrame(np.transpose([datelist, [0]*l, [0]*l]),
                      columns = ['date', 'total', 'TT'])

but it somehow also looks tedious. Any other suggestions?

Alicia Garcia-Raboso
  • 13,193
  • 1
  • 43
  • 48
hurrikale
  • 345
  • 1
  • 3
  • 8
  • if I use `df = pd.DataFrame(columns = ['b','a'])` it will retiain the order, but if I use `df = pd.DataFrame(columns = {'b','a'})`, the order of column names will be changed to `'a' 'b'`, any reason behind this? – Jia Gao Jan 29 '20 at 14:03

3 Answers3

15

You can pass the (correctly ordered) list of column as parameter to the constructor or use an OrderedDict:

# option 1:
result = pd.DataFrame({'date': datelist, 'total': [0]*len(datelist), 
                   'TT': [0]*len(datelist)}, columns=['date', 'total', 'TT'])

# option 2:
od = collections.OrderedDict()
od['date'] = datelist
od['total'] = [0]*len(datelist)
od['TT'] = [0]*len(datelist)
result = pd.DataFrame(od)
wonce
  • 1,893
  • 12
  • 18
  • I was going to suggest an `OrderedDict` but it doesn't work, it just gets cast to a `dict` probably. I didn't get reproducible order with it. – Andras Deak -- Слава Україні Oct 04 '16 at 22:22
  • 3
    It should work, pandas explicitly checks for it: https://github.com/pydata/pandas/blob/master/pandas/core/frame.py#L397 – wonce Oct 04 '16 at 22:23
  • Haha, you're right, I completely screwed up my `OrderedDict` definition:) Thanks, and sorry. – Andras Deak -- Слава Україні Oct 04 '16 at 22:36
  • As the other answer suggested, if do `result = pd.DataFrame({'date': datelist, 'total': [0]*len(datelist), 'TT': [0]*len(datelist)}, columns=['date', 'total', 'TT'])`, as in your first line, then it already seems to give the right order, so the rest is not necessary? Or am I missing something? – hurrikale Oct 04 '16 at 23:04
  • If I simply copy and paste all your code, it actually only gave me a dataframe with one column `date`. – hurrikale Oct 04 '16 at 23:07
  • Those were meant as two alternatives. I've clarified it now. – wonce Oct 04 '16 at 23:07
  • I see! So it seems like I need to either explicitly add the `column` argument in `DataFrame` or create an OrderedDict and insert each component separately. Thanks! – hurrikale Oct 04 '16 at 23:13
  • Yes. You could also initialize the OrderedDict this way, but I consider it less readable: http://stackoverflow.com/a/25480206/3257586 – wonce Oct 04 '16 at 23:18
  • Additional warning: if you have a dict of dicts, they *both* need to be OrderedDict (I just spent 10 minutes trying to understand why it wasn't working for me... ^_^; ) – RobM Jul 29 '19 at 05:48
  • I am creating a dataframe from Django table. Though functionalIy there is no issue, but I find that the columns are re-ordered (I think in alphabetical order). This sometimes causes heartburn when the data is to displayed. Can this ordering "functionality" be **stopped**? – Love Putin Not War Jun 07 '20 at 14:06
3
result = pd.DataFrame({'date': [23,24], 'total': 0,
                       'TT': 0},columns=['date','total','TT'])
python必须死
  • 999
  • 10
  • 12
2

Use pandas >= 0.23 in combination with Python >= 3.6.

result = pd.DataFrame({'date': datelist, 'total': [0]*len(datelist), 'TT': [0]*len(datelist)})

retains the dict's insertion order when creating a DataFrame (or Series) from a dict when using pandas v0.23.0 in combination with Python3.6.

See https://pandas.pydata.org/pandas-docs/version/0.23.0/whatsnew.html#whatsnew-0230-api-breaking-dict-insertion-order.

kadee
  • 8,067
  • 1
  • 39
  • 31
  • 3
    There is one caveat, it works fine for a `dict` of `list`s, but not for a `list` of `dict`s. Even if all the dicts have the same insertion order, the columns are still sorted alphabetically. (Tested with `pandas` 0.24.2 and Python 3.7) – hugovdberg May 10 '19 at 12:17