Convert multi-dimension Xarray into DataFrame - Python

Question

I have a big array with 4 dimensions, as follow:

>>> raw_data
<xarray.DataArray 'TRAC04' (time: 3, Z: 34, YC: 588, XC: 2160)>
[129548160 values with dtype=float32]
Coordinates: (12/15)
    iter       (time) int64 ...
  * time       (time) datetime64[ns] 2017-01-30T12:40:00 ... 2017-04-01T09:20:00
  * XC         (XC) float32 0.08333 0.25 0.4167 0.5833 ... 359.6 359.8 359.9
  * YC         (YC) float32 -77.98 -77.95 -77.91 -77.88 ... -30.02 -29.87 -29.72
  * Z          (Z) float32 -2.1 -6.7 -12.15 -18.55 ... -614.0 -700.0 -800.0
    rA         (YC, XC) float32 ...
    ...         ...
    maskC      (Z, YC, XC) bool ...
    maskCtrlC  (Z, YC, XC) bool ...
    rhoRef     (Z) float32 ...
    rLowC      (YC, XC) float32 ...
    maskInC    (YC, XC) bool ...
    rSurfC     (YC, XC) float32 ...
Attributes:
    standard_name:  TRAC04
    long_name:      Variable concentration
    units:          mol N/m^3

I want to transform it into a Dataframe with 5 columns, as 'XC', 'YC', 'Z', 'time', 'TRAC04'.

I tried to follow this question like this:

import itertools
data  = list(itertools.chain(*raw_data))
df = pd.DataFrame.from_records(data)

it runs it, however, I do not see creating anything in the environment. Furthermore, if I try to look at df with pd.head(df), it does run forever, without giving back outputs.

I tried, in any case, to save df, following this question, but it runs without ending also in this case:

np.savetxt(r'c:\data\DF_TRAC04.txt', df.values, fmt='%d')
df.to_csv(r'c:\data\DF_TRAC04.csv', header=None, index=None, sep=' ', mode='a')

score 1 · Accepted Answer · answered Jul 21 '22 at 21:09

I hope my answer can still help.

Let's first create a mock data with space variables x, y, z, and a time variable t.

import numpy as np
import xarray as xr

val = np.arange(54).reshape(2,3,3,3)
xc = np.array([10, 20, 30])
yc = np.array([50, 60, 70])
zc = np.array([1000, 2000, 3000])
t  = np.array([0, 1])

da = xr.DataArray(
    val,
    coords={'time': t,
        'z': zc,
        'y': yc,
        'x': xc}, 
    dims=["time","z","y", "x"]
)

You will get the following DataArray:

<xarray.DataArray (time: 2, z: 3, y: 3, x: 3)>
array([[[[ 0,  1,  2],
         [ 3,  4,  5],
         [ 6,  7,  8]],

        [[ 9, 10, 11],
         [12, 13, 14],
         [15, 16, 17]],

        [[18, 19, 20], 
         [21, 22, 23],
         [24, 25, 26]]],


       [[[27, 28, 29],
         [30, 31, 32],
         [33, 34, 35]],

        [[36, 37, 38],
         [39, 40, 41],
         [42, 43, 44]],

        [[45, 46, 47],
         [48, 49, 50],
         [51, 52, 53]]]])
Coordinates:
  * time     (time) int64 0 1
  * z        (z) int64 1000 2000 3000
  * y        (y) int64 50 60 70
  * x        (x) int64 10 20 30

If you want to have a flat file representation of the DataArray, you can use

da.to_dataframe(name='value').reset_index()

and this is the result:

    time     z   y   x  value
0      0  1000  50  10      0
1      0  1000  50  20      1
2      0  1000  50  30      2
3      0  1000  60  10      3
4      0  1000  60  20      4
...
49     1  3000  60  20     49
50     1  3000  60  30     50
51     1  3000  70  10     51
52     1  3000  70  20     52
53     1  3000  70  30     53

For saving the DataFrame to an ASCII file without the index, use:

da.to_dataframe(name='value').reset_index().to_csv('dump.csv', index=False)

Convert multi-dimension Xarray into DataFrame - Python

1 Answers1