0

This SO answer describes how to find the point closest to multiple lines in n-dimensional space using the least-squares method. Is there any existing implementation of this, but for the point closest to multiple m-flats in n-dimensional space?

My use case is that I used Multivariate Linear Regression to determine a relationship between an independent variable vector x and a dependent variable vector y, and now I'd like to find x that would make the stochastic process output values as closes as possible to experimental data y_exp. Since running MvLR yields a set of n-dimensional equations like so...

...and I have experimental values for y_1, ... y_q, each of these equations represents the intersection between two (n-1)-flat in n-dimensional space (one described by equation (1), and the other by y_1 = y_exp_1), which should yield an (n-2)-flat.

Is there an existing implementation to find a good set of x_1, ... x_p that gets me closest to these (n-2)-flats?

adentinger
  • 1,367
  • 1
  • 14
  • 30

1 Answers1

0

Rather than interpreting the set of equations like that, you could see it as a simple set of q linear equations with p variables -- in which case the problem has a much easier solution!

This set of equations is 100% equivalent to...

where:

In which case the least squares approximation method yields...

Assuming the rank of matrix A is equal to q (i.e., you have enough linearly independent equations to solve the problem, which is probably respected if your data is experimental), then this is equivalent to:

For an explanation on this, see this excellent video.

Python code:

import numpy as np

A = np.array(...) # The 2D array as described
b = np.array(...) # The 1D array as described

a_rank = np.linalg.matrix_rank(A)
if a_rank != b.size:
    raise ValueError(
        "Rank of matrix A for least squares approximation is "
        + f"inappropriate. Expected {b.size} but was {a_rank}. This "
        + "means either there are not enough equations, or there are "
        + "but too many of the equations are linearly dependant."
    )
x_star = np.linalg.inv(A.transpose().dot(A)).dot(A.transpose()).dot(b)
print("x*:", x_star)

In case you also want the R²:

# A.transpose() because A is equal to Beta transposed, and we want to do
# (x*) * Beta
b_hat = x_star.reshape((1, x_star.size)).dot(A.transpose()).reshape(b.size)
sse = np.square(b - b_hat).sum()
sst = np.square(b - b.mean()).sum()
rsquared = 1 - sse / sst
adentinger
  • 1,367
  • 1
  • 14
  • 30