Are there built-in ways to construct/deconstruct a dataframe from/to a Python list-of-Python-lists?
As far as the constructor (let's call it make_df
for now) that I'm looking for goes, I want to be able to write the initialization of a dataframe from literal values, including columns of arbitrary types, in an easily-readable form, like this:
df = make_df([[9.75, 1],
[6.375, 2],
[9., 3],
[0.25, 1],
[1.875, 2],
[3.75, 3],
[8.625, 1]],
['d', 'i'])
For the deconstructor, I want to essentially recover from a dataframe df
the arguments one would need to pass to such make_df
to re-create df
.
AFAIK,
- officially at least, the pandas.DataFrame constructor accepts only a numpy ndarray, a dict, or another DataFrame (and not a simple Python list-of-lists) as its first argument;
- the pandas.DataFrame.values property does not preserve the original data types.
I can roll my own functions to do this (e.g., see below), but I would prefer to stick to built-in methods, if available. (The Pandas API is pretty big, and some of its names not what I would expect, so it is quite possible that I have missed one or both of these functions.)
FWIW, below is a hand-rolled version of what I described above, minimally tested. (I doubt that it would be able to handle every possible corner-case.)
import pandas as pd
import collections as co
import pandas.util.testing as pdt
def make_df(values, columns):
return pd.DataFrame(co.OrderedDict([(columns[i],
[row[i] for row in values])
for i in range(len(columns))]))
def unmake_df(dataframe):
columns = list(dataframe.columns)
return ([[dataframe[c][i] for c in columns] for i in dataframe.index],
columns)
values = [[9.75, 1],
[6.375, 2],
[9., 3],
[0.25, 1],
[1.875, 2],
[3.75, 3],
[8.625, 1]]
columns = ['d', 'i']
df = make_df(values, columns)
Here's what the output of the call to make_df
above produced:
>>> df
d i
0 9.750 1
1 6.375 2
2 9.000 3
3 0.250 1
4 1.875 2
5 3.750 3
6 8.625 1
A simple check of the round-trip1:
>>> df == make_df(*unmake_df(df))
True
>>> (values, columns) == unmake_df(make_df(*(values, columns)))
True
BTW, this is an example of the loss of the original values' types:
>>> df.values
array([[ 9.75 , 1. ],
[ 6.375, 2. ],
[ 9. , 3. ],
[ 0.25 , 1. ],
[ 1.875, 2. ],
[ 3.75 , 3. ],
[ 8.625, 1. ]])
Notice how the values in the second column are no longer integers, as they were originally.
Hence,
>>> df == make_df(df.values, columns)
False
1 In order to be able to use ==
to test for equality between dataframes above, I resorted to a little monkey-patching:
def pd_DataFrame___eq__(self, other):
try:
pdt.assert_frame_equal(self, other,
check_index_type=True,
check_column_type=True,
check_frame_type=True)
except:
return False
else:
return True
pd.DataFrame.__eq__ = pd_DataFrame___eq__
Without this hack, expressions of the form dataframe_0 == dataframe_1
would have evaluated to dataframe objects, not simple boolean values.