The parameters section of the documentation for DataFrame
(as of pandas
2.0.0) begins:
data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame
Dict can contain Series, arrays, constants, dataclass or list-like objects. If data is a dict, column order follows insertion-order. If a dict contains Series which have an index defined, it is aligned by its index. This alignment also occurs if data is a Series or a DataFrame itself. Alignment is done on Series/DataFrame inputs.
If data is a list of dicts, column order follows insertion-order.
The description points to valid input types (i.e., ndarray, Iterable, dict, or DataFrame) but does not completely describe how the constructor will turn the data
into a DataFrame
. It seems like somewhat of a black box. Should I be able to predict, based on the documentation, that, say, passing a list
containing a single Series
and no other arguments will give a result that looks like Series.to_frame().T
(although the dtypes may differ; see this answer and this one)?
The purpose of this question is to solicit answers that classify the different ways of passing data to a DataFrame()
via data
, according to how the constructor puts or massages the data into the DataFrame
. It is necessarily a broad question, but there should be a finite number of cases given that the constructor is, you know, implemented in code. I'm interested in this question and would be willing to dig through the source code a little to discover the answer; however, I think others with more experience may have insights to share here before I do that.
This is a single question about rules broadly, and I believe its answers belong together in one place. However, since it is broad, I will provide some specific sub-questions to get us started:
For
iterable
s, what container and element combinations are valid? Without needing to try it, should I be able to predict what will happen if I pass alist
ofDataFrames
or aSeries
ofSeries
? Which axis is used when aSeries
input is "aligned by its index"? Does the treatment depend at all on what its elements are?How do the container and element types passed via
data
affect how theDataFrame
will be put together? Should I be able to predict how the data will be aligned along the axes of the resultingDataFrame
based on knowledge ofdata
alone? I don't know if the answer is obvious, but in either case I do not see it documented.If I think of a
DataFrame
as "a dict-like container forSeries
objects" (as docs suggest), what are the intuitive rules governing howdata
gets interpreted (loosely) into keys and values?
I'm open to suggestions for improving the question, but I do think it's a question that needs to be asked and I did not find a similar question on this site.