6

I have a 3 dimensional numpy array, (z, x, y). z is a time dimension and x and y are coordinates.

I want to convert this to a multiindexed pandas.DataFrame. I want the row index to be the z dimension and each column to have values from a unique x, y coordinate (and so, each column would be multi-indexed).

The simplest case (not multi-indexed):

>>> array.shape
(500L, 120L, 100L)

>>> df = pd.DataFrame(array[:,0,0])

>>> df.shape
(500, 1)

I've been trying to pass the whole array into a multiindex dataframe using pd.MultiIndex.from_arrays but I'm getting an error: NotImplementedError: > 1 ndim Categorical are not supported at this time

Looks like it should be fairly simple but I cant figure it out.

Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135
BioProg
  • 153
  • 2
  • 11

2 Answers2

10

I find that a Series with a Multiindex is the most analagous pandas datatype for a numpy array with arbitrarily many dimensions (presumably 3 or more).

Here is some example code:

import pandas as pd
import numpy as np

time_vals = np.linspace(1, 50, 50)
x_vals = np.linspace(-5, 6, 12)
y_vals = np.linspace(-4, 5, 10)

measurements = np.random.rand(50,12,10)

#setup multiindex
mi = pd.MultiIndex.from_product([time_vals, x_vals, y_vals], names=['time', 'x', 'y'])

#connect multiindex to data and save as multiindexed Series
sr_multi = pd.Series(index=mi, data=measurements.flatten())

#pull out a dataframe of x, y at time=22
sr_multi.xs(22, level='time').unstack(level=0)

#pull out a dataframe of y, time at x=3
sr_multi.xs(3, level='x').unstack(level=1)
Selah
  • 7,728
  • 9
  • 48
  • 60
  • Great answer to commonly asked question(s) about wrangling 3D numpy arrays into pandas. Much easier to understand than others I have seen. Bravo @Selah ! – Epimetheus Dec 14 '21 at 15:42
4

I think you can use panel - and then for Multiindex DataFrame add to_frame:

np.random.seed(10)
arr = np.random.randint(10, size=(5,3,2))
print (arr)
[[[9 4]
  [0 1]
  [9 0]]

 [[1 8]
  [9 0]
  [8 6]]

 [[4 3]
  [0 4]
  [6 8]]

 [[1 8]
  [4 1]
  [3 6]]

 [[5 3]
  [9 6]
  [9 1]]]

df = pd.Panel(arr).to_frame()
print (df)
             0  1  2  3  4
major minor               
0     0      9  1  4  1  5
      1      4  8  3  8  3
1     0      0  9  0  4  9
      1      1  0  4  1  6
2     0      9  8  6  3  9
      1      0  6  8  6  1

Also transpose can be useful:

df = pd.Panel(arr).transpose(1,2,0).to_frame()
print (df)
             0  1  2
major minor         
0     0      9  0  9
      1      1  9  8
      2      4  0  6
      3      1  4  3
      4      5  9  9
1     0      4  1  0
      1      8  0  6
      2      3  4  8
      3      8  1  6
      4      3  6  1

Another possible solution with concat:

arr = arr.transpose(1,2,0)
df = pd.concat([pd.DataFrame(x) for x in arr], keys=np.arange(arr.shape[2]))
print (df)
    0  1  2  3  4
0 0  9  1  4  1  5
  1  4  8  3  8  3
1 0  0  9  0  4  9
  1  1  0  4  1  6
2 0  9  8  6  3  9
  1  0  6  8  6  1

np.random.seed(10)
arr = np.random.randint(10, size=(500,120,100))
df = pd.Panel(arr).transpose(2,0,1).to_frame()
print (df.shape)
(60000, 100)

print (df.index.max())
(499, 119)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks! This is getting close. But the shape of the data is not right, I'm looking for 500 rows (as "major") and 0 and 1 as minor as you have in your initial example. But I'm getting 500 columns instead. I've tried different permutations of transpose but still not quite right. – BioProg Apr 15 '17 at 15:16
  • Do you need `500` rows in major, `120` or `100` in minor and `100` or `120` columns? – jezrael Apr 15 '17 at 15:21
  • maybe need `.transpose(1,0,2)` if `120` columns. – jezrael Apr 15 '17 at 15:22
  • Yes, I'm looking for 500 rows in major, 120 in minor and 100 columns. .transpose(1,0,2) doesn't do the trick. – BioProg Apr 15 '17 at 15:24
  • so need `.transpose(2,0,1)` – jezrael Apr 15 '17 at 15:26
  • Thank a lot, mate! – BioProg Apr 15 '17 at 15:33
  • It's worth noting that Pandas panel is deprecated. This [answer](https://stackoverflow.com/a/48482831/2205775) gives a solution directly with dataframes. – Renato Garcia Jul 26 '18 at 15:41
  • Concat worked but I needed to drop the keys arg: df = pd.concat([pd.DataFrame(x) for x in arr]) – ChrisDanger May 13 '22 at 16:36