I've got a multidimensional numpy array that I'm trying to stick into a pandas data frame. I'd like to flatten the array, and create a pandas index that reflects the pre-flattened array indices.
Note I'm using 3D to keep the example small, but I'd like to generalize to at least 4D
A = np.random.rand(2,3,4)
array([[[ 0.43793885, 0.40078139, 0.48078691, 0.05334248],
[ 0.76331509, 0.82514441, 0.86169078, 0.86496111],
[ 0.75572665, 0.80860943, 0.79995337, 0.63123724]],
[[ 0.20648946, 0.57042315, 0.71777265, 0.34155005],
[ 0.30843717, 0.39381407, 0.12623462, 0.93481552],
[ 0.3267771 , 0.64097038, 0.30405215, 0.57726629]]])
df = pd.DataFrame(A.flatten())
I'm trying to generate x/y/z columns like this:
A z y x
0 0.437939 0 0 0
1 0.400781 0 0 1
2 0.480787 0 0 2
3 0.053342 0 0 3
4 0.763315 0 1 0
5 0.825144 0 1 1
6 0.861691 0 1 2
7 0.864961 0 1 3
...
21 0.640970 1 2 1
22 0.304052 1 2 2
23 0.577266 1 2 3
I've tried setting this up using np.meshgrid
but I'm going wrong somewhere:
dimnames = ['z', 'y', 'x']
ranges = [ np.arange(x) for x in A.shape ]
ix = [ x.flatten() for x in np.meshgrid(*ranges) ]
for name, col in zip(dimnames, ix):
df[name] = col
df = df.set_index(dimnames).squeeze()
This result looks somewhat sensible, but the indices are wrong:
df
z y x
0 0 0 0.437939
1 0.400781
2 0.480787
3 0.053342
1 0 0 0.763315
1 0.825144
2 0.861691
3 0.864961
0 1 0 0.755727
1 0.808609
2 0.799953
3 0.631237
1 1 0 0.206489
1 0.570423
2 0.717773
3 0.341550
0 2 0 0.308437
1 0.393814
2 0.126235
3 0.934816
1 2 0 0.326777
1 0.640970
2 0.304052
3 0.577266
print A[0,1,0]
0.76331508999999997
print print df.loc[0,1,0]
0.75572665000000006
How can I create the index columns to reflect the shape of A
?