To address the question: Is appending a row to a DataFrame is more expensive than appending a column?
We need to take into account various factors, but the most important one is the internal physical data layout of Pandas Dataframe.
The short and kind of naive answer:
If the table(aka DataFrame) is stored in a column-wise physical layout, then add or fetch a column is faster than with a row; if the table is stored in a row-wise physical layout, it's the other way. In general, the default Pandas DataFrame is stored column-wise(but NOT all the time). So in general, appending a row to a DataFrame is indeed more expensive than appending a column. And you could consider the nature of Pandas DataFrame to be a dict of columns.
A longer answer:
Pandas needs to choose a way to arrange the internal layout of a table in memory (such as a Dataframe of 10 rows and 2 columns). The most common two approaches are column-wise and row-wise.
Pandas is built on top of Numpy, and DataFrame and Seires are built on top of Numpy Array. But do notice though Numpy Array is internally stored row-wise in Memory, this is NOT the case for Pandas DataFrame. How DataFrame is stored depends on how it was initiated, cf this post:https://krbnite.github.io/Memory-Efficient-Windowing-of-Time-Series-Data-in-Python-2-NumPy-Arrays-vs-Pandas-DataFrames/
It's actually quite natural that Pandas adopt a column-wise layout most of the time because Pandas was designed to be a data analysis tool that relies more heavily on column-oriented operations than row-oriented operations. cf https://www.stitchdata.com/columnardatabase/
In the end, the answer to the question Is appending a row to a DataFrame is more expensive than appending a column? also depends on caching, prefetching etc. Thus it's a rather complicated question to answer and could depend on specific runtime conditions. But the most important factor is the data layout.
Answer from the authors of Pandas
The authors of Pandas actually mentioned this point in their design documentation. cf https://github.com/pydata/pandas-design/blob/master/source/internal-architecture.rst#what-is-blockmanager-and-why-does-it-exist
So, to do anything row oriented on an all-numeric DataFrame, pandas
would concatenate all of the columns together (using numpy.vstack or
numpy.hstack) then use array broadcasting or methods like ndarray.sum
(combined with np.isnan to mind missing data) to carry out certain
operations.