52

How can I create a DataFrame from multiple numpy arrays, Pandas Series, or Pandas DataFrame's while preserving the order of the columns?

For example, I have these two numpy arrays and I want to combine them as a Pandas DataFrame.

foo = np.array( [ 1, 2, 3 ] )
bar = np.array( [ 4, 5, 6 ] )

If I do this, the bar column would come first because dict doesn't preserve order.

pd.DataFrame( { 'foo': pd.Series(foo), 'bar': pd.Series(bar) } )

    bar foo
0   4   1
1   5   2
2   6   3

I can do this, but it gets tedious when I need to combine many variables.

pd.DataFrame( { 'foo': pd.Series(foo), 'bar': pd.Series(bar) }, columns = [ 'foo', 'bar' ] )

EDIT: Is there a way to specify the variables to be joined and to organize the column order in one operation? That is, I don't mind using multiple lines to complete the entire operation, but I'd rather not having to specify the variables to be joined multiple times (since I will be changing the code a lot and this is pretty error prone).

EDIT2: One more point. If I want to add or remove one of the variables to be joined, I only want to add/remove in one place.

ceiling cat
  • 5,501
  • 9
  • 38
  • 51
  • In pandas versions >= 0.25.0 the order of the columns is preserved. All you need is: `pd.DataFrame({'foo': foo, 'bar': bar})` – rachwa Jun 05 '22 at 06:10

12 Answers12

36

Original Solution: Incorrect Usage of collections.OrderedDict

In my original solution, I proposed to use OrderedDict from the collections package in python's standard library.

>>> import numpy as np
>>> import pandas as pd
>>> from collections import OrderedDict
>>>
>>> foo = np.array( [ 1, 2, 3 ] )
>>> bar = np.array( [ 4, 5, 6 ] )
>>>
>>> pd.DataFrame( OrderedDict( { 'foo': pd.Series(foo), 'bar': pd.Series(bar) } ) )

   foo  bar
0    1    4
1    2    5
2    3    6

Right Solution: Passing Key-Value Tuple Pairs for Order Preservation

However, as noted, if a normal dictionary is passed to OrderedDict, the order may still not be preserved since the order is randomized when constructing the dictionary. However, a work around is to convert a list of key-value tuple pairs into an OrderedDict, as suggested from this SO post:

>>> import numpy as np
>>> import pandas as pd
>>> from collections import OrderedDict
>>>
>>> a = np.array( [ 1, 2, 3 ] )
>>> b = np.array( [ 4, 5, 6 ] )
>>> c = np.array( [ 7, 8, 9 ] )
>>>
>>> pd.DataFrame( OrderedDict( { 'a': pd.Series(a), 'b': pd.Series(b), 'c': pd.Series(c) } ) )

   a  c  b
0  1  7  4
1  2  8  5
2  3  9  6

>>> pd.DataFrame( OrderedDict( (('a', pd.Series(a)), ('b', pd.Series(b)), ('c', pd.Series(c))) ) )

   a  b  c
0  1  4  7
1  2  5  8
2  3  6  9
Community
  • 1
  • 1
Eddo Hintoso
  • 1,442
  • 14
  • 22
  • If you just pass a normal dict to `OrderedDict` you can still lose the order, it's lost as soon as you create the normal dict – Marius Apr 11 '16 at 03:50
  • @ceilingcat I've edited my answer. To make sure `OrderedDict` preserves your order I suggest constructing with key-value pair tuples. Hope that helps! – Eddo Hintoso Apr 11 '16 at 04:01
33

Use the columns keyword when creating the DataFrame:

pd.DataFrame({'foo': foo, 'bar': bar}, columns=['foo', 'bar'])

Also, note that you don't need to create the Series.

blokeley
  • 6,726
  • 9
  • 53
  • 75
15

To preserve column order pass in your numpy arrays as a list of tuples to DataFrame.from_items:

>>> df = pd.DataFrame.from_items([('foo', foo), ('bar', bar)])

   foo  bar
0    1    4
1    2    5
2    3    6

Update

From pandas 0.23 from_items is deprecated and will be removed. So pass the numpy arrays using from_dict. To use from_dict you need to pass the items as a dictionary:

>>> from collections import OrderedDict as OrderedDict
>>> df = pd.DataFrame.from_dict(OrderedDict(zip(['foo', 'bar'], [foo, bar])))

From python 3.7 you can depend on insertion order being preserved (see https://mail.python.org/pipermail/python-dev/2017-December/151283.html) so:

>>> df = pd.DataFrame.from_dict(dict(zip(['foo', 'bar'], [foo, bar])))

or simply:

>>> df = pd.DataFrame(dict(zip(['foo', 'bar'], [foo, bar])))
Vidhya G
  • 2,250
  • 1
  • 25
  • 28
  • 1
    This works great, but gives my a deprecation warning: `FutureWarning: from_items is deprecated. Please use DataFrame.from_dict(dict(items), ...) instead. DataFrame.from_dict(OrderedDict(items)) may be used to preserve the key order.` – kabdulla Aug 12 '18 at 10:26
  • @kabdulla Thank you for catching the deprecation. I have updated the answer. Thanks again. – Vidhya G Sep 06 '18 at 16:29
9

After having created your dataframe, you can simply reorder the columns the way you want by using

df= df[['foo','bar']]
tfv
  • 6,016
  • 4
  • 36
  • 67
1

I couldn't comment to ask, but how will you specify the order of the columns in the first place (since you can't with a regular dictionary)?

If you want to maintain an ordered dictionary:

from collections import OrderedDict
import numpy as np
import pandas as pd

data = OrderedDict()
data['foo'] = np.array([1, 2, 3])
data['bar'] = np.array([4, 5, 6])

df = pd.DataFrame(data)

If you just have a list of keys for order:

data = {key: value for key, value in data.iteritems()}
df = pd.concat(data.values(), keys=['foo', 'bar'], axis=1)

@tfv's answer is likely the most concise way to do what you want.

Eric
  • 71
  • 4
  • Your first answer works for me (I don't think the second one works though). Just to clarify, if I need to add/remove a column, I don't want to have to do this in multiple places because it's error prone. – ceiling cat Apr 11 '16 at 05:56
1
>>> pd.concat([pd.Series(eval(col), name=col) for col in ['foo', 'bar']], axis=1)
   foo  bar
0    1    4
1    2    5
2    3    6

This works using eval. Your list of column names must match the corresponding variable name.

>>> eval('foo')
array([1, 2, 3])
Alexander
  • 105,104
  • 32
  • 201
  • 196
0

This may be an other way to approach it:

foo = np.array( [ 1, 2, 3 ] )
bar = np.array( [ 4, 5, 6 ] )
stacked = np.vstack((x,y)).T
stacked
array([[1, 4],
       [2, 5],
       [3, 6]])

new_df = pd.DataFrame(stacked, columns = ['foo', 'bar'] )
new_df
   foo  bar
0   1   4
1   2   5
2   3   6
Joe T. Boka
  • 6,554
  • 6
  • 29
  • 48
0

Make the dataframe with just the data in it, and transpose it.

Then add the columns.

>>> foo = np.array( [ 1, 2, 3 ] )
>>> bar = np.array( [ 4, 5, 6 ] )
>>>     
>>> df = pd.DataFrame([foo, bar]).T
>>> df.columns = ['foo','bar']
>>> df
  foo bar
0  1   4
1  2   5
2  3   6
Saminfeld
  • 1
  • 1
0

Another sketchy solution might be to pass a X_ to the title of the column where X is the number of the order of the column:

pd.DataFrame( { '2_foo': pd.Series(foo), '1_bar': pd.Series(bar) } )

And after that you can use columns or something to rename the columns again! The less pythonic code in the world!!!

Good luck mates!

Borja_042
  • 1,071
  • 1
  • 14
  • 26
0

What I did is as follow:

# Creating list of dict
list_of_dicts = ({'key1':'valueA', 'key2':'valueB},{'key1':'valueC', 'key2':'valueD}')

#getting list of keys from the dict
keys_list = list(list_of_dicts.keys())

# and finally
df = pd.DataFrame(list_of_dicts, columns = keys_list)

Worked perfectly for me.

Bartosz Konieczny
  • 1,985
  • 12
  • 27
0

The question is very old but wish to provide my solution to the question of "Preserving the order of columns while reading a csv file into pandas data frame":

import numpy as np
import pandas as pd

# Get column count as a list
cols = np.arange(0, hmprice.shape[1])
df = pd.read_csv('train.csv', usecols=cols)
df.head()

Default Order of dataframe:
Preserved order of dataframe:

NOTE: The attribute usecols can take either column names or column indices; but pandas doesn't honor "any other order" of column names or column indices.

For example,

    df = pd.read_csv('train.csv', usecols=[1, 2, 3])<br/>
or
    df = pd.read_csv('train.csv', usecols=[3, 2, 1])<br/>

gives the same result.

0

Since Python3.7, dicts are ordered by default, so you can just use your dict source to preserve the column ordering:

data = [{'foo': 1, 'bar': 4}, {'foo': 2, 'bar': 5}, {'foo': 3, 'bar': 6}]
pd.DataFrame(data, columns=data[0])
Jon
  • 952
  • 1
  • 11
  • 17