2

I need to perform some geometric operations with geometries from another source on a netCDF-file. Therefore I store the geometries (shapely.geometry.Polygon) from the other source in a geopandas.GeoDataFrame.

Next is to read a netCDF file into a GeoDataFrame. The recipe seems clear: read the netCDF with xarray, store it into a pandas.DataFrame, perform a shapely.geometry.Point operation on the extracted lat/lon data and convert it into a GeoDataFrame.

Afterwards, I will do some statistics with geometry-operators.


When I read the netCDF file with xarray (see here)

import xarray as xr
dnc = xr.open_dataset(ff)  
df = dnc.to_dataframe()

I get

>>>> dnc   
<xarray.Dataset>
Dimensions:  (lat: 16801, lon: 19201)
Coordinates:
   * lat      (lat) float32 -32.0 -31.9992 -31.9983 -31.9975 -31.9967 ...
   * lon      (lon) float32 -73.0 -72.9992 -72.9983 -72.9975 -72.9967 ...
Data variables:  
     hgt      (lat, lon) int16 0 0 0 4 0 5 0 9 9 8 0 0 0 0 0 0 0 0 0 0 0 0 0 ...

>>> dnc.hgt.size
322596001
>>> dnc.lat.size
16801
>>> dnc.lon.size
19201

and

>>> df.head()
                  hgt   
lat   lon                  
-32.0 -73.000000    0  
      -72.999168    0  
      -72.998337    0  
      -72.997498    4  
      -72.996666    0

In df there is no access on latand lon. I also have problems to understand the partially empty column lat. So I think that the shapely.geometry.Point((lon, lat)) must be performed on dnc for every combination of lon and lat. Is that right? Any ideas how to code it?

Georgy
  • 12,464
  • 7
  • 65
  • 73
TSchonn
  • 21
  • 1
  • 3
  • The index in your `df` is a `Pandas.MultiIndex`. There are a bunch of ways to do this, try `df.reset_index()` to start. – jhamman Sep 20 '17 at 22:59

2 Answers2

3

Like @jhamman mentioned in the comments, your lats and lons are indexes in your pandas frame. So starting with that

import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
from io import StringIO

s = StringIO('''             
    lat,lon,hgt                  
    -32.0,-73.000000,0  
    -32.0,-72.999168,0  
    -32.0,-72.998337,0  
    -32.0,-72.997498,4  
    -32.0,-72.996666,0
    ''')

    df = pd.read_csv(s)
    df = df.set_index(['lat', 'lon'])

We'll first reset the frame's index

df = df.reset_index()

then we'll create our geometry. i.e. shapely points with a list comp

geom = [Point(x,y) for x, y in zip(df['lon'], df['lat'])]

and now we convert our Pandas DataFrame to a GeoPandas GeoDataFrame

gdf = gpd.GeoDataFrame(df, geometry=geom)
print(gdf.head())
    lat        lon  hgt                                          geometry
0 -32.0 -73.000000                      0                 POINT (-73 -32)
1 -32.0 -72.999168                      0  POINT (-72.99916800000001 -32)
2 -32.0 -72.998337                      0  POINT (-72.99833700000001 -32)
3 -32.0 -72.997498                      4  POINT (-72.99749799999999 -32)
4 -32.0 -72.996666                      0          POINT (-72.996666 -32)
Bob Haffner
  • 8,235
  • 1
  • 36
  • 43
  • Thanks @bob-haffner, @jhamman, `df.reset_index()` worked well. The huge amount of data made the computer swapping while resetting the index. I will work on the geom-operation next week and give feedback. – TSchonn Sep 22 '17 at 07:40
  • Ok, perhaps there's a way to avoid the reset by changing the way the dataframe is created (xarray.Dataset -> DataFrame) so that it doesn't set lat, lon as the index in the first place? – Bob Haffner Sep 22 '17 at 13:55
0

It took me even longer to try some solutions for my swapping memory (8 GB). At last I tried dask, but my approach is still not the right one:

for f in nc_files:
ff = os.path.join(nc_path, f)
try:
    dnc = xr.open_dataset(ff, chunks={'lat': 400, 'lon': 400})
    df = dnc.to_dataframe()
    df = df.reset_index()
    geom = [Point(x,y) for x, y in zip(df['lon'], df['lat'])]
    gdf = gpd.GeoDataFrame(df, geometry=geom)
    print(gdf.head())
except Exception as e:
    print(e)

As mentioned above, the files are big:

>>> dnc.hgt.size
322596001
>>> dnc.lat.size
16801
>>> dnc.lon.size
19201

Is there another approach to create a geometry.Point directly from the netCDF-File?

TSchonn
  • 21
  • 1
  • 3