25

I want to load lists into columns of a pandas DataFrame but cannot seem to do this simply. This is an example of what I want using transpose() but I would think that is unnecessary:

In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: x = np.linspace(0,np.pi,10)
In [4]: y = np.sin(x)
In [5]: data = pd.DataFrame(data=[x,y]).transpose()
In [6]: data.columns = ['x', 'sin(x)']
In [7]: data
Out[7]: 
          x        sin(x)
0  0.000000  0.000000e+00
1  0.349066  3.420201e-01
2  0.698132  6.427876e-01
3  1.047198  8.660254e-01
4  1.396263  9.848078e-01
5  1.745329  9.848078e-01
6  2.094395  8.660254e-01
7  2.443461  6.427876e-01
8  2.792527  3.420201e-01
9  3.141593  1.224647e-16

[10 rows x 2 columns]

Is there a way to directly load each list into a column to eliminate the transpose and insert the column labels when creating the DataFrame?

Steven C. Howell
  • 16,902
  • 15
  • 72
  • 97

3 Answers3

31

Someone just recommended creating a dictionary from the data then loading that into the DataFrame like this:

In [8]: data = pd.DataFrame({'x': x, 'sin(x)': y})
In [9]: data
Out[9]: 
          x        sin(x)
0  0.000000  0.000000e+00
1  0.349066  3.420201e-01
2  0.698132  6.427876e-01
3  1.047198  8.660254e-01
4  1.396263  9.848078e-01
5  1.745329  9.848078e-01
6  2.094395  8.660254e-01
7  2.443461  6.427876e-01
8  2.792527  3.420201e-01
9  3.141593  1.224647e-16

[10 rows x 2 columns]

Note than a dictionary is an unordered set of key-value pairs. If you care about the column orders, you should pass a list of the ordered key values to be used (you can also use this list to only include some of the dict entries):

data = pd.DataFrame({'x': x, 'sin(x)': y}, columns=['x', 'sin(x)'])
Steven C. Howell
  • 16,902
  • 15
  • 72
  • 97
  • 1
    You can specify column order this way: In [9]: In [5]: data = pd.DataFrame({'x':x, 'sin(x)':y}, columns=['x','sin(x)']) – Dashing Adam Hughes Mar 12 '15 at 16:08
  • 1
    You're missing quotes in the dictionary initialization. – Alex Mar 12 '15 at 16:16
  • @StevenC.Howell As of pandas version 0.25.0: If data is a dict, column order follows insertion-order ([docs](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html)) – theodosis Oct 02 '22 at 21:42
7

Here's another 1-line solution preserving the specified order, without typing x and sin(x) twice:

data = pd.concat([pd.Series(x,name='x'),pd.Series(y,name='sin(x)')], axis=1)
drsealks
  • 2,282
  • 1
  • 17
  • 34
dabru
  • 786
  • 8
  • 8
4

If you don't care about the column names, you can use this:

pd.DataFrame(zip(*[x,y]))

run-time-wise it is as fast as the dict option, and both are much faster than using transpose.

Oren Matar
  • 651
  • 4
  • 11