1

I am following this explanation of pandas's MultiIndex. When an index is composed of multiple columns, some columns are taken to be children of other columns.

In contrast, multiple column keys in a SQL table do not have such a relationship between columns. I can still sort by multiple columns at whim, including those of a MultiIndex.

What purpose is served by the parent-child relationship in MultiIndex columns?

P.S. I found this post, but it doesn't really focus on the hierarchical nature of MultiIndex.

user36800
  • 2,019
  • 2
  • 19
  • 34
  • I think it ties into the fact that multiIndexes can be used to represent higher dimensional data (3D. 4D. etc) as 2D data, simply by appropriately setting the levels in the multiIndex to represent the dimensions. – cs95 Dec 28 '20 at 02:12
  • True that. But you don't actually need a parent-child relationship between the dimensions/columns to do this. Relational data has been doing this for decades. That's why I was wondering what is achieved by the hierarchical relationship. – user36800 Dec 29 '20 at 02:43
  • You don't think matrix dimensions are hierarchical in relationship? For example, "row-major" and "column-major" terms are used to indicate the dominant axis in array traversal. – cs95 Dec 29 '20 at 02:49
  • I don't. SQL has used multi-column keys for a long time, and there is no inherent pecking order in the columns. If you really wanted to, you can sort a table by several columns, with a pecking order between the columns, but you can then turn around and re-sort the table according to the key's columns, but specified in a different pecking order. I got used to thinking of multiple dimensions without an inherent pecking order, though you can impose whatever order you want in a table's sorting of rows. It's very arbitrary. – user36800 Dec 29 '20 at 06:02
  • Python, Matlab, and Java have a pecking order in the axes, but that has more to do with how the data is physically laid out in memory for *full arrays*. Relational data, such as in a dataframe (called "tables" in Matlab, Excel [1], and SQL) are like sparse matrices in that records exist only for points in the data cube that are not empty. There is no regular layout like a full array; the records can have an arbitrary order. So the pecking order of axes doesn't apply. So the question is whether hardwiring the axes order in Python's MultiIndex has other benefits. [1] `ListObject`s, not ranges. – user36800 Dec 31 '20 at 02:07

0 Answers0