Column order in pandas.concat

Question

I do as below:

data1 = pd.DataFrame({ 'b' : [1, 1, 1], 'a' : [2, 2, 2]})
data2 = pd.DataFrame({ 'b' : [1, 1, 1], 'a' : [2, 2, 2]})
frames = [data1, data2]
data = pd.concat(frames)
data


   a    b
0   2   1
1   2   1
2   2   1
0   2   1
1   2   1
2   2   1

The data column order is in alphabet order. Why is it so? and how to keep the original order?

Possible duplicate of [How to change the order of DataFrame columns?](http://stackoverflow.com/questions/13148429/how-to-change-the-order-of-dataframe-columns) — albert, Aug 19 '16 at 20:10
Possible duplicate of [Preserving Column Order - Python Pandas and Column Concat](http://stackoverflow.com/questions/32533944/preserving-column-order-python-pandas-and-column-concat) — ayhan, Aug 19 '16 at 20:18

score 20 · Accepted Answer · edited Nov 22 '20 at 15:24

20

You are creating DataFrames out of dictionaries. Dictionaries are a unordered which means the keys do not have a specific order. So

d1 = {'key_a': 'val_a', 'key_b': 'val_b'}

and

d2 = {'key_b': 'val_b', 'key_a': 'val_a'}

are (probably) the same.

In addition to that I assume that pandas sorts the dictionary's keys descending by default (unfortunately I did not find any hint in the docs in order to prove that assumption) leading to the behavior you encountered.

So the basic motivation would be to resort / reorder the columns in your DataFrame. You can do this as follows:

import pandas as pd

data1 = pd.DataFrame({ 'b' : [1, 1, 1], 'a' : [2, 2, 2]})
data2 = pd.DataFrame({ 'b' : [1, 1, 1], 'a' : [2, 2, 2]})
frames = [data1, data2]
data = pd.concat(frames)

print(data)

cols = ['b' , 'a']
data = data[cols]

print(data)

edited Nov 22 '20 at 15:24

Gulzar

23,452
27
113
201

answered Aug 19 '16 at 20:18

albert

8,027
10
48
84

Will this "just work" in Python 3.7? Dicts are guaranteed to maintain their order. – BallpointBen Aug 27 '18 at 15:46
As an improvement, you don't have to manually specify the columns. You can just do... `combined = pd.concat([df1, df2])[df1.columns]`. This does assume you already have a data frame with columns in the order you want. But that was my case. – bwest87 Jan 27 '19 at 20:22

score 10 · Answer 2 · answered Jul 18 '19 at 14:54

10

Starting from version 0.23.0, you can prevent the concat() method to sort the returned DataFrame. For example:

df1 = pd.DataFrame({ 'a' : [1, 1, 1], 'b' : [2, 2, 2]})
df2 = pd.DataFrame({ 'b' : [1, 1, 1], 'a' : [2, 2, 2]})
df = pd.concat([df1, df2], sort=False)

A future version of pandas will change to not sort by default.

answered Jul 18 '19 at 14:54

Michael H.

535
6
11

1

To demonstrate `sort=False` vs `sort=True` the last line should be `df = pd.concat([df2, df1], sort=False)`. Then the columns will be in different order depending on `sort`. – SergiyKolesnikov Jan 05 '20 at 20:33

score 5 · Answer 3 · edited Jul 13 '18 at 09:10

5

def concat_ordered_columns(frames):
    columns_ordered = []
    for frame in frames:
        columns_ordered.extend(x for x in frame.columns if x not in columns_ordered)
    final_df = pd.concat(frames)    
    return final_df[columns_ordered]       

# Usage
dfs = [df_a,df_b,df_c]
full_df = concat_ordered_columns(dfs)

This should work.

edited Jul 13 '18 at 09:10

user5305519

3,008
4
26
44

answered Jan 02 '18 at 16:57

Philip Zelitchenko

51
1
2

Thanks, this was nice to have for my case where the DataFrames were being created from already-existing CSV files. – Jon Apr 09 '18 at 23:20

score 3 · Answer 4 · answered Aug 19 '16 at 20:28

You can create the original DataFrames with OrderedDicts

from collections import OrderedDict

odict = OrderedDict()
odict['b'] = [1, 1, 1]
odict['a'] = [2, 2, 2]
data1 = pd.DataFrame(odict)
data2 = pd.DataFrame(odict)
frames = [data1, data2]
data = pd.concat(frames)
data


    b    a
0   1    2
1   1    2
2   1    2
0   1    2
1   1    2
2   1    2

score 3 · Answer 5 · answered Jul 13 '18 at 10:01

you can also specify the order like this :

import pandas as pd

data1 = pd.DataFrame({ 'b' : [1, 1, 1], 'a' : [2, 2, 2]})
data2 = pd.DataFrame({ 'b' : [1, 1, 1], 'a' : [2, 2, 2]})
listdf = [data1, data2]
data = pd.concat(listdf)
sequence = ['b','a']
data = data.reindex(columns=sequence)

score 2 · Answer 6 · answered Aug 20 '19 at 13:13

2

Simplest way is firstly make the columns same order then concat:

df2=df2[df1.columns]
df=pd.concat((df1,df2),axis=0)

answered Aug 20 '19 at 13:13

Emre Tatbak

103
3
11

Column order in pandas.concat

6 Answers6

Linked