4

I need to calculate the plane of array (POA) irradiance using python's pvlib package (https://pvlib-python.readthedocs.io/en/stable/). For this I would like to use the output data from the WRF model (GHI, DNI, DHI). The output data is in netCDF format, which I open using the netCDF4 package and then I extract the necessary variables using the wrf-python package.

With that I get a xarray.Dataset with the variables I will use. I then use the xarray.Dataset.to_dataframe() method to transform it into a pandas dataframe, and then I transform the dataframe into a numpy array using the dataframe.values. And then I do a loop where in each iteration I calculate the POA using the function irradiance.get_total_irradiance (https://pvlib-python.readthedocs.io/en/stable/auto_examples/plot_ghi_transposition.html) for a grid point.

That's the way I've been doing it so far, however I have over 160000 grid points in the WRF domain, the data is hourly and spans 365 days. This gives a very large amount of data. I believe if pvlib could work directly with xarray.dataset it could be faster. However, I could only do it this way, transforming the data into a numpy.array and looping through the rows. Could anyone tell me how I can optimize this calculation? Because the code I developed is very time-consuming.

If anyone can help me with this I would appreciate it. Maybe an improvement to the code, or another way to calculate the POA from the WRF data...

I'm providing the code I've built so far:

from pvlib import location
from pvlib import irradiance

import os

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import xarray as xr
import netCDF4
import wrf

Getting WRF data

variaveis = ['T2',
             'U10',
             'V10',
             'SWDDNI',
             'SWDDIF',
             'SWDOWN']

netcdf_data = netCDF4.Dataset('wrfout_d02_2003-11-01_00_00_00')

first = True

for v in variaveis:

    var = wrf.getvar(netcdf_data, v, timeidx=wrf.ALL_TIMES)
    
    if first:
        met_data = var
        first = False
    else:
        met_data = xr.merge([met_data, var])

met_data = xr.Dataset.reset_coords(met_data, ['XTIME'], drop=True)
met_data['T2'] = met_data['T2'] - 273.15

WS10 = (met_data['U10']**2 + met_data['V10']**2)**0.5
met_data['WS10'] = WS10

df = met_data[['SWDDIF', 
               'SWDDNI', 
               'SWDOWN', 
               'T2', 
               'WS10']].to_dataframe().reset_index().drop(columns=['south_north', 
                                                                   'west_east'])

df.rename(columns={'SWDOWN': 'ghi',
                   'SWDDNI':'dni', 
                   'SWDDIF':'dhi', 
                   'T2':'temp_air', 
                   'WS10':'wind_speed',
                   'XLAT': 'lat',
                   'XLONG': 'lon',
                   'Time': 'time'}, inplace=True)
df.set_index(['time'], inplace=True)

df = df[df.ghi>0]
df.index = df.index.tz_localize('America/Recife')

Function to get POA irradiance

def get_POA_irradiance(lon, lat, date, dni, dhi, ghi, tilt=10, surface_azimuth=0):

    site_location = location.Location(lat, lon, tz='America/Recife')

    # Get solar azimuth and zenith to pass to the transposition function
    solar_position = site_location.get_solarposition(times=date)
    
    # Use the get_total_irradiance function to transpose the GHI to POA
    POA_irradiance = irradiance.get_total_irradiance(
        surface_tilt = tilt,
        surface_azimuth = surface_azimuth,
        dni = dni,
        ghi = ghi,
        dhi = dhi,
        solar_zenith = solar_position['apparent_zenith'],
        solar_azimuth = solar_position['azimuth'])
    
    # Return DataFrame with only GHI and POA
    
    return pd.DataFrame({'lon': lon,
                         'lat': lat,
                         'GHI': ghi,
                         'POA': POA_irradiance['poa_global']}, index=[date])

Loop in each row (time) of the array

array = df.reset_index().values
    
list_poa = []
    
def loop_POA():   
    for i in tqdm(range(len(array) - 1)):
        POA = get_POA_irradiance(lon=array[i,6], 
                                 lat=array[i,7], 
                                 dni=array[i,2], 
                                 dhi=array[i,1], 
                                 ghi=array[i,3], 
                                 date=str(array[i,0]))
        list_poa.append(POA)
    
    return list_poa

poa_final = pd.concat(lista)
  • 2
    side point, have you tried reading the data with [`xr.open_dataset`](http://xarray.pydata.org/en/stable/generated/xarray.open_dataset.html)? Just curious why you'd use the netCDF package directly – Michael Delgado Aug 20 '21 at 21:37
  • 2
    Hmm yeah others might have better domain-specific news but based on some very quick digging into the pvlib package it's not written to handle numpy arrays with extra dimensions or anything. If each component of the array needs to be handled by a pure python function, this will be slow. Your options as far as I can tell are to rewrite the functions you need or to just speed up the total runtime with dask or another parallel processing library. – Michael Delgado Aug 20 '21 at 21:47
  • 1
    It's a numfocus project though so they probably have people wrestling with the same question. You might have more luck posting an issue or with their mailing list. – Michael Delgado Aug 20 '21 at 21:49
  • 1
    Does the WRF output have solar position? If so, use that and save yourself the calculation time. Most of the low-level transposition functions can accept higher dimensional input, so use those instead of wrappers like get_total_irradiance. – Will Holmgren Aug 21 '21 at 01:38
  • Dear, thank you very much for the answers. @Michael Delgado, I use the netCDF package directly following the wrf-python documentation. Last time I checked the getvar function didn't work with xarray.Dataset. Thanks for the other suggestions. – Robson Barreto Aug 21 '21 at 02:23
  • @Will Holmgren, thanks for the reply. Unfortunately, for this WRF simulation, the solar position was not saved in the output files, only the solar zenith angle. That's why I'm using the algorithm to calculate the solar position. – Robson Barreto Aug 21 '21 at 02:25
  • If you're only missing solar azimuth, then consider using [solar_azimuth_analytical](https://pvlib-python.readthedocs.io/en/stable/generated/pvlib.solarposition.solar_azimuth_analytical.html#pvlib-solarposition-solar-azimuth-analytical). It's likely accurate enough for your application (since you're already starting with some kind of modeled data). – Will Holmgren Aug 21 '21 at 19:30

1 Answers1

2

Thanks both for a good question and for using pvlib! You're right that pvlib is intended for modeling single locations and is not designed for use with xarray datasets, although some functions might coincidentally work with them.

I strongly suspect that the majority of the runtime you're seeing is for the solar position calculations. You could switch to a faster method (see the method options here), as the default solar position method is very accurate but also quite slow when calculating bulk positions. Installing numba will help, but it still might be too slow for you, so you might check the other models (ephemeris, pyephem). There are also some fast but low-precision methods, but you will need to change your code a bit to use them. See the list under "Correlations and analytical expressions for low precision solar position calculations" here.

Like Michael Delgado suggests in the comments, parallel processing is an option. But that can be a headache in python. You will probably want multiprocessing, not multithreading.

Another idea is to use atlite, a python package designed for this kind of spatial modeling. But its solar modeling capabilities are not nearly as detailed as pvlib, so it might not be useful for your case.

One other note: I don't know if the WRF data are interval averages or instantaneous values, but if you care about accuracy you should handle them differently for transposition. See this example.

Edit to add: after looking at your code again, there might be another significant speedup to be had. Are you calling get_POA_irradiance for single combinations of position and timestamp? If so, that is unnecessary and very slow. It would be much faster to pass in the full time series for each location, i.e. scalar lat/lon but vector irradiance.

kevinsa5
  • 3,301
  • 2
  • 25
  • 28
  • kevisan5, thank you very much for your generous reply. In fact, I was calling the get_POA_irradiance function for single combinations of position, timestamp, ghi... and after implementing your suggestion the execution speed increased, much faster than the previous way. This is my first time using PVLIB for something, so I couldn't even identify that this was a problem. I really appreciate your other suggestions, I'll check it out too. – Robson Barreto Aug 21 '21 at 02:12