OpenMDAO Metamodel is not respecting training data

Question

I am using an OpenMDAO semi-structured metamodel as part of a Dymos optimisation. There are two input values, so the range is 2D. Usually, this works fine. However, I recently noticed that, for a certain set of training data, it does not properly interpolate.

I used the metamodel html visualisation tool to look at what was going on and could see the fit was wrong. Hovering over certain data points, I could see that they were displaying the input data values properly. However, move the cursor the slightest bit away from that point in any direction and the interpolation results was wildly different. This means the metamodel "fit" does not go through the training data points, or even close to it in some regions.

This issue was present when using the method 'slinear'. I switched it to use 'lagrange2' and the fit seems much better now. This method does seem to be very computationally expensive, though, as my optimisation has yet to complete and it has already been over 3 times the amount of time required using 'slinear'. Therefore, I would like to be able to go back to 'slinear'.

Does anyone have any insight into why this is happening and how to resolve the issue? All help is greatly appreciated. Thanks.

score 5 · Answer 1 · answered Mar 14 '23 at 16:54

The semi-structured metamodel is tricky. For this 2D table, it assumes that the data can be decomposed into two sequential interpolations, one between lines, and one between points on those lines. If you have enough points on all sides of you to match your chosen order, then it behaves like the structured interpolation. Unfortunately, semi structured grids can be missing data in such a way that you end up extrapoling on the subdimension, possibly from far away. Here is an example:

x, y
[0.1, 0.1]
[0.1, 0.2]
[0.2, 0.1]
[0.2, 0.2]
[0.8, 0.8]
[0.8, 0.9]

If you try to compute the value at [0.21, 0.15], your next line in x is found at 0.8, and it is extrapolating far from its defined values, and this will introduce an abrupt change in the interpolated value.

Here are some things you could try:

You could try swapping the dimensions on your table to see if you have better behavior. (So if your table is X,Y, try Y,X.) It is data-dependent, but it might help make the extrapolation occur in a direction that is more suitable.
You could try the OpenMDAO unstructured metamodel. There will be a performance cost.
You may be able to get better performance by adding a few key points to your unstructured metamodel. I would look for points that could prevent internal extrapolations -- in the case above, adding values for (0.8, 0.1) and (0.8, 0.2) can help define behavior in that wide range.
Ultimately, if you can turn your data into a fully structured metamodel, you can eliminate the interior extrapolation, plus you can take advantage of some much faster algorithms.

Ultimately, the semi-structured metamodel is best used for data with well defined regions, where you don't expect to be querying the table in any of the internal extrapolation zones.

Hi Kenneth, thanks for the quick and very helpful answer. Those are some useful suggestions, especially #4 — J_Code, Mar 14 '23 at 21:24

score 2 · Accepted Answer · answered Mar 14 '23 at 16:46

I'd need to see your data to have any guess of why the linear interpolation isn't working, so I can't comment much on that. Likely your semi-structured data has a hole in it near there and the linear interpolation is very poor.

However, I can offer some advice for how to improve the speed. Semi-structured data is convenient to use sometimes, but it comes at a very large computational cost. If all you have is semi-structured data, but you want better speed you can consider re-interpolating the data onto a structured grid first, then passing that into a structured metamodel instead. There is a stand-alone interface for semi-structured metamodels that you can use to write a small script that will loop over a structured input grid and re-interpolate the data for you.

You should use the 2D-slinear or 2D-lagrange2 options. These are the fast, fixed dimension options that will give the best performance.

In general, I don't recommend using slinear interpolations for optimization, because the are not smooth. They do have the nice benefit of fitting the input data exactly, but the cost of that C1 discontinuity is pretty large on some problems (not all, but many). So some kind of smoothing is usually best.

Thank you for your very quick answer. The data is unfortunately proprietary, but I really appreciate your willingness to look at it. The information in your answer should be enough to help me out. Thanks again. — J_Code, Mar 14 '23 at 21:22
Update: Interpolating the data onto a structured grid was massively helpful in reducing computation time. Thank you. — J_Code, Mar 22 '23 at 18:25

OpenMDAO Metamodel is not respecting training data

2 Answers2