Python: How to use pandas to combine several csv files (with same headers) into a 3d cube

Question

Essentially I am trying to create a 3-dimensional data frame that looks something like:

CSV1:
   IDX Col_A Col_B Col_C 
    1    1      2    3 
    2    4      5    6 
    .    .      .    .

CSV2: 
  IDX Col_A  Col_B  Col_C
    1   11     12     13
    2   14     15     16 
    .    .     .      .

CSV3: 
   IDX Col_A Col_B Col_C
    1   9     8    7
    2   6     5    4 
    .   .     .    .

So the column headers are the same and may or may not have the same number of rows.

So what I want to create is a 3-dimensional pandas data structure holding these different csv files where maybe the X axis is the ID of the CSV file (csv1, csv2, etc.) Y axis is the columns, the Z axis is the rows in the CSV file (the order of those axis is flexible, i.e., X can be columns, etc. This is not important).

In other words, for example [1,1,1] would be the value in the 1st row of the 1st column in the 1st csv file, and [2,4,5] is the value in the 4th column in the 5th row in the 2nd csv file.

Actually the order of the axis is not important, I guess, meaning can be [csv ID, column, row] or [column, row, csv ID] or what ever.

In this way I can pull out slices from the different csv files to perform operations like the mean of the values, etc.

One reason why I first thought about pandas is because I am aware of the power of the slice processing.

For example, if I wanted to get the "mean" value of the 1st row of the 1st column in all 3 csv files and write that value to the 1st row of the 1st column of a 4th csv file then I'm expecting to do something like df[4,1,1] = df[1:3,1,1].mean().

I'm guessing that this is not the right syntax but hopefully accurately expresses my intention.

Anyone have any ideas how to do this or if this is possible?

Many thanks.

why use pandas instead of NumPy? Seems to me creating a three-dimensional array would be much easier. You can also use .mean functions in NumPy to do the exact same calculations that you would do in Pandas. — George Adams, Feb 27 '21 at 16:45
Thanks for the suggestion, George. I'll look into that. I'm not very familiar with either pandas or numpy so I started with pandas :) — leslie, Feb 27 '21 at 16:58
Looks like xarray is the best way to go to get similar power as pandas with multi-dimensional support. Thanks, everyone :) — leslie, Mar 01 '21 at 15:44

Python: How to use pandas to combine several csv files (with same headers) into a 3d cube

0 Answers0