1

I have been using the STK toolbox for a few days, for kriging of environmental parameter fields, i.e. in a geostatistical context.

I find the toolbox very well implemented and useful (big thanks to the authors!), and the kriging predictions I am getting through STK actually seem fine; however, I am finding myself unable to visualize a semivariogram model based on the STK output (i.e. estimated parameters for gaussian process / covariance functions).

I am attaching an example figure, showing the empirical semivariogram for a simple 1D test case and a Gaussian semivariogram model (as typically used in geostatistics, see also figure) fitted directly to that data. The figure further shows a semivariogram model based on STK output, i.e. using previously estimated model parameters (model.param from stk_param_estim) to get covariance K on a target grid of lag distances and then converting K to semivariance (according to the well-known relation semivar = K0-K where K0 is the covariance at zero lag). I am attaching a simple script to reproduce the figure and detailing the attempted conversion.

As you can see in the figure, this doesn’t do the trick. I have tried several other simple examples and STK datasets, but models obtained through STK vs direct fitting never agree, and in fact usually look much more different than in the example (i.e. the range often seems very different, in addition to the sill/sigma2; uncomment line 12 in the script to see another example). I have also attempted to input the converted STK parameters into the geostatistical model (also in the script), however, the output is identical to the result based on converting K above.

I’d be very thankful for your help!

Figure illustrating the lack of agreement between semivariograms based on direct fit vs conversion of STK output

% Code to reproduce the figure illustrating my problem of getting
% variograms from STK output. The only external functions needed are those
% included with STK.
% TEST DATA - This is simply a monotonic part of the normal pdf
nugget = 0;
X = [0:20]'; % coordinates
% X = [0:50]'; % uncomment this line to see how strongly the models can deviate for different test cases
V =  normpdf(X./10+nugget,0,1); % observed values

covmodel = 'stk_gausscov_iso'; % covar model, part of STK toolbox
variomodel = 'stk_gausscov_iso_vario'; % variogram model, nested function

% GET STRUCTURE FOR THE SELECTED KRIGING (GAUSSIAN PROCESS) MODEL
nDim = size(X,2);
model = stk_model (covmodel, nDim);
model.lognoisevariance = NaN; % This makes STK fit nugget

% ESTIMATE THE PARAMETERS OF THE COVARIANCE FUNCTION
[param0, model.lognoisevariance] = stk_param_init (model, X, V); % Compute an initial guess for the parameters of the covariance function (param0)
model.param = stk_param_estim (model, X, V, param0); % Now model the covariance function

% EMPIRICAL SEMIVARIOGRAM (raw, binning removed for simplicity)
D = pdist(X)';
semivar_emp = 0.5.*(pdist(V)').^2;

% THEORETICAL SEMIVARIOGRAM FROM STK
% Target grid of lag distances
DT = [0:1:100]';
DT_zero = zeros(size(DT));
% Get covariance matrix on target grid using STK estimated pars
pairwise = true;
K = feval(model.covariance_type, model.param, DT, DT_zero, -1, pairwise);
% convert covariance to semivariance, i.e. G = C(0) - C(h)
sill = exp(model.param(1));
nugget = exp(model.lognoisevariance);
semivar_stk = sill - K + nugget; % --> this variable is then plotted

% TEST: FIT A GAUSSIAN VARIOGRAM MODEL DIRECTLY TO THE EMPIRICAL SEMIVARIOGRAM
f = @(par)mseval(par,D,semivar_emp,variomodel);
par0 = [10 10 0.1]; % initial guess for pars
[par,mse] = fminsearch(f, par0); % optimize
semivar_directfit = feval(variomodel, par, DT); % evaluate

% TEST 2: USE PARS FROM STK AS INPUT TO GAUSSIAN VARIOGRAM MODEL
par(1) = exp(model.param(1)); % sill, PARAM(1) = log (SIGMA ^ 2), where SIGMA is the standard deviation,
par(2) = sqrt(3)./exp(model.param(2)); % range, PARAM(2) = - log (RHO), where RHO is the range parameter. --- > RHO = exp(-PARAM(2))
par(3) = exp(model.lognoisevariance); % nugget
semivar_stkparswithvariomodel = feval(variomodel, par, DT);

% PLOT SEMIVARIOGRAM
figure(); hold on;
plot(D(:), semivar_emp(:),'.k'); % Observed variogram, raw
plot(DT, semivar_stk,'-b','LineWidth',2); % Theoretical variogram, on a grid
plot(DT, semivar_directfit,'--r','LineWidth',2); % Test direct fit variogram
plot(DT,semivar_stkparswithvariomodel,'--g','LineWidth',2); % Test direct fit variogram using pars from stk
legend('raw empirical semivariance (no binned data here for simplicity) ',...
    'Gaussian cov model from STK, i.e. exp(Sigma2) - K + exp(lognoisevar)',...
    'Gaussian semivariogram model (fitted directly to semivariance)',...
    'Gaussian semivariogram model (using transformed params from STK)');
xlabel('Lag distance','Fontweight','b');
ylabel('Semivariance','Fontweight','b');

% NESTED FUNCTIONS
% Objective function for direct fit
function [mse] = mseval(par,D,Graw,variomodel)
Gmod = feval(variomodel, par, D);
mse = mean((Gmod-Graw).^2);
end
% Gaussian semivariogram model.
function [semivar] = stk_gausscov_iso_vario(par, D) %#ok<DEFNU>
% D : lag distance, c : sill, a : range, n : nugget
c = par(1); % sill
a = par(2); % range
if length(par) > 2, n = par(3); % nugget optional
else, n = 0; end
semivar = n + c .* (1 - exp( -3.*D.^2./a.^2 )); % Model
end
mdh
  • 11
  • 3

1 Answers1

0

There is nothing wrong with the way you compute the semivariogram.

To understand the figure that you obtain, consider that:

  1. The parameters of the model are estimated in STK using the (restricted) maximum likelihood method, not by least-squares fitting on the semi-variogram.
  2. For very smooth stationary random fields observed over short intervals, you should not expect that the theoretical semivariogram will agree with the empirical semivariogram, with or without binning. The reason for this is that the observations, and thus the squared differences, are very correlated in this case.

To convince yourself of the second point, you can run the following script repeatedly:

% a smooth GP
model = stk_model (@stk_gausscov_iso, 1);
model.param = log ([1.0, 0.2]);  % unit variance

x_max = 20;  x_obs = x_max * rand (50, 1);

% Simulate data
z_obs = stk_generate_samplepaths (model, x_obs);

% Empirical semivariogram (raw, no binning)
h = (pdist (double (x_obs)))';
semivar_emp = 0.5 * (pdist (z_obs)') .^ 2;

% Model-based semivariogram
x1 = (0:0.01:x_max)';
x0 = zeros (size (x1));
K = feval (model.covariance_type, model.param, x0, x1, -1, true);
semivar_th = 1 - K;

% Figure
figure;  subplot (1, 2, 1);  plot (x_obs, z_obs, '.');
subplot (1, 2, 2);  plot (h(:), semivar_emp(:),'.k');  hold on;
plot (x1, semivar_th,'-b','LineWidth',2);
legend ('empirical', 'model');  xlabel ('lag');  ylabel ('semivar');

Further questions on parameter estimation for Gaussian process models should probably be asked on Cross-Validated rather than Stack Overflow.

Julien Bect
  • 118
  • 5