0

Essentially I am trying to create a 3-dimensional data frame that looks something like:

CSV1:
   IDX Col_A Col_B Col_C 
    1    1      2    3 
    2    4      5    6 
    .    .      .    .

CSV2: 
  IDX Col_A  Col_B  Col_C
    1   11     12     13
    2   14     15     16 
    .    .     .      .

CSV3: 
   IDX Col_A Col_B Col_C
    1   9     8    7
    2   6     5    4 
    .   .     .    .

So the column headers are the same and may or may not have the same number of rows.

So what I want to create is a 3-dimensional pandas data structure holding these different csv files where maybe the X axis is the ID of the CSV file (csv1, csv2, etc.) Y axis is the columns, the Z axis is the rows in the CSV file (the order of those axis is flexible, i.e., X can be columns, etc. This is not important).

In other words, for example [1,1,1] would be the value in the 1st row of the 1st column in the 1st csv file, and [2,4,5] is the value in the 4th column in the 5th row in the 2nd csv file.

Actually the order of the axis is not important, I guess, meaning can be [csv ID, column, row] or [column, row, csv ID] or what ever.

In this way I can pull out slices from the different csv files to perform operations like the mean of the values, etc.

One reason why I first thought about pandas is because I am aware of the power of the slice processing.

For example, if I wanted to get the "mean" value of the 1st row of the 1st column in all 3 csv files and write that value to the 1st row of the 1st column of a 4th csv file then I'm expecting to do something like df[4,1,1] = df[1:3,1,1].mean().

I'm guessing that this is not the right syntax but hopefully accurately expresses my intention.

Anyone have any ideas how to do this or if this is possible?

Many thanks.

leslie
  • 1
  • 1
  • why use pandas instead of NumPy? Seems to me creating a three-dimensional array would be much easier. You can also use .mean functions in NumPy to do the exact same calculations that you would do in Pandas. – George Adams Feb 27 '21 at 16:45
  • Thanks for the suggestion, George. I'll look into that. I'm not very familiar with either pandas or numpy so I started with pandas :) – leslie Feb 27 '21 at 16:58
  • Looks like xarray is the best way to go to get similar power as pandas with multi-dimensional support. Thanks, everyone :) – leslie Mar 01 '21 at 15:44

0 Answers0