0

I am new to xarray and confused at how I am supposed to construct Datasets and DataArrays. I have xyz point data and each point has 2 data values.

Below is my attempt to do this but I am receiving the error ValueError: Could not convert tuple of form (dims, data[, attrs, encoding]): ... to Variable. I believe this is telling me that my point_data1 and point_data2 need to be 3 dimensional, but I am confused on how to do that in a way that makes sense for my use case.

import numpy as np

num_points = 20
point_locations = np.random.randint(99, size=(num_points, 3))
point_data1= np.ones(num_points)
point_data2 = np.random.randint(5, size=num_points)

ds = xr.Dataset({'point_data1': (['x', 'y', 'z'], point_data1 ),
                 'point_data2 ': (['x', 'y', 'z'], point_data2 )},
                coords={'x': point_locations[:,0], 'y': point_locations[:,1], 'z': point_locations[:,2]})
JSoothe
  • 1
  • 1
  • 1
    You might find this question/answer useful: https://stackoverflow.com/questions/75278985/how-can-i-reshape-data-in-a-csv-into-a-structured-format. TLDR, you can use `unstack` if you can structure your index to be a multi-index (along the points dimension). – jhamman Feb 02 '23 at 23:47
  • Alternatively, you may just want to keep your data in a 1D format, where your variables and the (x, y, z) locations are stored as variables indexed by something like `point_id`. In this case, assign the position vectors the same way you’re assigning the data. This will avoid exploding your memory if the possible x/y/z space is large (or continuous) – Michael Delgado Feb 03 '23 at 06:01

1 Answers1

0

This seemed to accomplish what I want. This also allows you to pass in sparse=True to from_dataframe.

import numpy as np
import pandas as pd
import xarray as xr

num_points = 20
point_locations = np.random.randint(99, size=(num_points, 3))
point_data1 = np.ones(num_points)
point_data2 = np.random.randint(5, size=num_points)

df = pd.DataFrame()
df['x'] = point_locations[:, 0]
df['y'] = point_locations[:, 1]
df['z'] = point_locations[:, 2]
df['point_data1'] = np.ones(num_points)
df['point_data2'] = np.random.randint(5, size=num_points)
df = df.set_index(['x', 'y', 'z'])
ds = xr.Dataset.from_dataframe(df)
JSoothe
  • 1
  • 1