0

Beforehand: If something is unclear or you need more information, feel free to write in the comments, so I can try to deliver what you need.

Unfortunately I don't know much about NetCSF files, but I need them for my thesis. For this I work on the following CSV (named as dfn) file, this has different measuring points which have published H2 data since 1950 in irregular intervals.

     Country Latitude Longitude  Altitude  Sample Name        Date      H2  Year  month      dates
5         DE  511.622   149.506       238       199706  15.06.1997   -71.7  1997      6 1997-06-15
6         DE  511.622   149.506       238       199707  15.07.1997   -70.1  1997      7 1997-07-15
7         DE  511.622   149.506       238       199708  15.08.1997   -64.5  1997      8 1997-08-15
8         DE  511.622   149.506       238       199709  15.09.1997   -39.1  1997      9 1997-09-15
9         DE  511.622   149.506       238       199710  15.10.1997   -56.4  1997     10 1997-10-15
...      ...      ...       ...       ...          ...         ...     ...   ...    ...        ...
4995      DE  490.422   121.019       365       201304  15.04.2013  -41.86  2013      4 2013-04-15
4996      DE  490.422   121.019       365       201305  15.05.2013  -68.03  2013      5 2013-05-15
4997      DE  490.422   121.019       365       201306  15.06.2013  -54.98  2013      6 2013-06-15
4998      DE  490.422   121.019       365       201307  15.07.2013  -39.23  2013      7 2013-07-15
4999      DE  490.422   121.019       365       201308  15.08.2013  -46.93  2013      8 2013-08-15

I want to create a NetCDF file from this CSV file, which has the dimensions: Time, Lat, Long and Height. As variables I want to define Time, Long, Lat, Height, Country and H2, where H2 depends on the 4 dimensions (Time, Lat, Long and Height).

Question number 1: How do I do this?

Question number 2: How can I insert the data from the CSV file into the variables?

My code looks like this so far (My Data can be found here: https://filebin.net/m0r2qcb90o373v5h):

import pandas as pd
import xarray





IAEA_DATEN = pd.read_csv(r"C:/Users/Oliver Weisser/Desktop/Bachelor/Programm/Daten/Beispiel_PLOT/Example_File.csv", sep=";")

dfn = pd.DataFrame(IAEA_DATEN)

# create xray Dataset from Pandas DataFrame

xr = dfn.set_index(['Latitude', 'Longitude', 'Altitude','Year', 'Month']).to_xarray()
# add variable attribute metadata
xr['Latitude'].attrs={'units':'degrees', 'long_name':'Latitude'}
xr['Longitude'].attrs={'units':'degrees', 'long_name':'Longitude'}
xr['Altitude'].attrs={'units':'m', 'long_name':'Altitude'}
xr['H2'].attrs={'units':'per mill', 'long_name':'Deuterium'}
xr['Month'].attrs={'units':'Month', 'long_name':'Month'}
xr['Yaer'].attrs={'units':'Yaer', 'long_name':'Year'}

# add global attribute metadata
xr.attrs={'Conventions':'CF-1.6', 'title':'Data', 'summary':'Data '}
#print xr
print (xr)
# save to netCDF
xr.to_netcdf('C:/Users/Oliver Weisser/Desktop/Bachelor/Programm/Daten/Classifyed/H2_netCDF/test.nc')









(I followed the code from: xarray writing to netCDF from Pandas - dimension issue)

My xr output is:


>>> print (xr)
<xarray.Dataset>
Dimensions:          (dates: 533, Latitude: 14, Longitude: 14)
Coordinates:
  * dates            (dates) datetime64[ns] 1970-07-15 1970-08-15 ... 2013-12-15
  * Latitude         (Latitude) object '476.772' '478.008' ... '540.967'
  * Longitude        (Longitude) object '11,59' '110.108' ... '84.889' '9,19'
Data variables: (12/18)
    Country          (dates, Latitude, Longitude) object nan nan nan ... nan nan
    Altitude         (dates, Latitude, Longitude) float64 nan nan ... nan nan
    Sample Name      (dates, Latitude, Longitude) float64 nan nan ... nan nan
    Date             (dates, Latitude, Longitude) object nan nan nan ... nan nan
    Begin of Period  (dates, Latitude, Longitude) object nan nan nan ... nan nan
    End of Period    (dates, Latitude, Longitude) object nan nan nan ... nan nan
    ...               ...
    end month        (dates, Latitude, Longitude) float64 nan nan ... nan nan
    begin days       (dates, Latitude, Longitude) float64 nan nan ... nan nan
    end days         (dates, Latitude, Longitude) float64 nan nan ... nan nan
    Days             (dates, Latitude, Longitude) float64 nan nan ... nan nan
    Year             (dates, Latitude, Longitude) float64 nan nan ... nan nan
    month            (dates, Latitude, Longitude) float64 nan nan ... nan nan
Attributes:
    Conventions:  CF-1.6
    title:        Data
    summary:      Data generated
>>>
Weiss
  • 176
  • 2
  • 16
  • Please provide an [mcve] we can copy-paste. What doesn't work about your code? What does `xr` look like in the end? – Joooeey May 06 '22 at 13:33
  • Hi @Joooeey, is there somewhere here a possibility to upload data? – Weiss May 06 '22 at 13:37
  • Not that I know of but there are lots of places to upload files on the internet. Github gist comes to mind. Then just read the csv with pandas: https://stackoverflow.com/a/41880513/4691830 – Joooeey May 06 '22 at 13:44
  • When I open the Xr and read it in an explorer, I don't have my data from the CSV file in it and all the columns from the CSV file are a variable of their own, depending on the dimensions lat, long and dates. My question is, how can I assign the right dimension to the variables so that e.g. Lat is only assigned to the dimension Lat and not to the dimensions Long and Date. And secondly, how can I insert the CSV data into the NetCDF file? – Weiss May 06 '22 at 13:44
  • Can you please just post the output of `print(xr)` in your question? – Joooeey May 06 '22 at 13:47
  • Do you need anything else to understand my question better? (I am new here and do not know yet what information is useful for answering the questions) :) – Weiss May 06 '22 at 13:53
  • yes, the input data or a minimum version of the input data. – Joooeey May 06 '22 at 13:54
  • Does your data fill the 4-D grid or is it sparse? – Joooeey May 06 '22 at 13:56
  • I Think It's Sparse – Weiss May 06 '22 at 13:57
  • 1
    In that case you'd expect some nans in the output. – Joooeey May 06 '22 at 13:58
  • And can you clarify what variables and dimensions you need. It's weird to have the same columns for both dimensions and variable. Then once you do that, make sure the dataframe only has columns for the variables of interest and all the dimensions in the multiindex. – Joooeey May 06 '22 at 14:01
  • And read the [mcve] link again. The newest edit adds a lot of superfluous code and the data is still on local disc where we can't access it. – Joooeey May 06 '22 at 14:03
  • Ok I will adjust everything but can take a little while. But thank you very much :) – Weiss May 06 '22 at 14:11
  • One more thing I noticed so far: Looks like your H2 and some other columns are strings, not numbers. To fix this: `pd.read_csv(..., decimal=',')` Perhaps that was the main problem. – Joooeey May 06 '22 at 14:13
  • I have tried to replace the strings with the command: df["H2"] = df["H2"].str.replace(',', '.') to bypass – Weiss May 06 '22 at 14:24
  • That's exactly what I noticed but the result of this is a string again. And you need a number. Best to solve this early on. – Joooeey May 06 '22 at 14:28
  • Ok and if I use the command pd.read_csv(..., decimal=','), then the other values which have a . as a decimal character, won't that create a problem – Weiss May 06 '22 at 14:36
  • yes it will. Best practice is still to solve this further down the root because having a CSV with mixed number formats is a nightmare. To get a quick fix instead, find the pandas method for converting from string to float. – Joooeey May 06 '22 at 14:42
  • ok i have fixed this now, i just don't know how to make the data available for everyone – Weiss May 06 '22 at 14:52
  • Then take your time to figure it out. Upload it anywhere on the internet (google search file sharing, there are thousands of options) and then read it with `pd.read_csv`. https://stackoverflow.com/a/41880513/4691830 Just make sure it's okay for you to publish the data. – Joooeey May 06 '22 at 14:59
  • OK I will take a look at the weekend again.. However, many thanks already – Weiss May 06 '22 at 15:04
  • Hi, @Joooeey I finaly found some time to upload my data, I also changed the Code :) – Weiss May 08 '22 at 18:02
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/244596/discussion-between-weiss-and-joooeey). – Weiss May 09 '22 at 06:52

0 Answers0