Best fit plane for 3D data

Question

I have my 3D data X,Y,Z (Matrices with size NxM)

I want to fit it to the best fit plane what I did:

X = X(isfinite(X));% deleting the NaN because svd Doesn't accept them
Y = Y(isfinite(Y));
Z = Z(isfinite(Z));

G = [X,Y,Z,ones(size(X(:)))];
[u s v] = svd(G,0);
P = v(:,4);
scalar = 2*P./P(1);
P = P./scalar; % supposed to be my plane equation but there is something wrong

and then recalculate the Z from X and Y

Z = -(P(1)*X + P(2)*Y + P(4)) / P(3);

I don't know what the problem is!!

I don't even know where to start correcting all of the problems here. Start over. Go slowly. Think about what you are doing. — , Sep 13 '13 at 12:59
When you say something is wrong, what is wrong? We don't know what the problem is either, unless you at least describe it. — Peter, Sep 13 '13 at 14:23
For other readers, this seems to have been pulled from my answer on plane fitting, which does indeed work: http://stackoverflow.com/a/10904220/1401351 — Peter, Sep 13 '13 at 14:24
1. If the different arrays have different numbers of hans, then X,Y,Z will NOT have the same number of elements after the deletions that were made. — , Sep 13 '13 at 14:26
2. If X,Y,Z are NxM matrices, then concatenating them using commas will NOT produce the fit you might be expecting! This will just do something ridiculous, and unexpected. — , Sep 13 '13 at 14:28
3. If your goal is to do an "errors in variables" fit, then there is NO reason to append a column of ones. Simply subtract off the global means. Then SVD will work properly. — , Sep 13 '13 at 14:30
Yes Peter I have got this from your answer but maybe I have made a mistake with applying it — Jack_111, Sep 13 '13 at 14:33
WoodChips the different matrices have the same number of NaNs — Jack_111, Sep 13 '13 at 14:37
4. The stuff with "scalar = 2*P./P(1)" is completely messed up. Hard to guess what they are trying to do there. Dividing by the coefficient of X, then they divide P by scalar. Essentially this just multiplies P by P(1)/2 — , Sep 13 '13 at 14:37
@WoodChips the If you look again to the answer you will see that I don't have any more NxM Matrices. I have now Column vectors because I applied the isfinit function — Jack_111, Sep 13 '13 at 14:38
@ Peter You are the only one who knows his answer and you may help me in that — Jack_111, Sep 13 '13 at 14:39
Even if the arrays have the SAME number of NaNs, it is still obscene code, bound to produce bugs at some point in the future. Are the NaNs known to be in the same places? — , Sep 13 '13 at 14:39
IF your goal is to do a planar fit, I'll write an answer that is mathematically/numerically correct. — , Sep 13 '13 at 14:40
My goal is to make planar fit for my data Yes and I will appreciate it if your code works with me :) because I am stuck now — Jack_111, Sep 13 '13 at 14:42
You ONLY have vectors if there were NaNs in there to remove. Suppose that one day, your matrices DON't have any NaNs in them? Then your code will bug out because nothing was removed, then your arrays will still be NxM. My point is, this is terrible code, aimed to generate bugs for sure in your future. — , Sep 13 '13 at 14:43
IT won't bug because if my matrices don't have NaNs in them , then it won't delete any of them and I will have column vector :). but My data alwaysss has NaNs — Jack_111, Sep 13 '13 at 14:45
The 2*P./P(1) was specific to the original answer. ABCD plane coefficients can vary by a scalar, and the scaling by 2 was just to show that the derived numbers really did match the original coefficients. — Peter, Sep 13 '13 at 19:06

score 9 · Accepted Answer · 2013-09-13T17:33:49.763

Ok, so the goal is to do a planar fit to your data. We presume that there MAY be NaN elements in at some of the arrays. So lets do an errors in variables fit to the data.

% First, remove NaNs. All the NaN (if any) will be in the same places,
% so we need do only one test. Testing each of X,Y,Z leads to confusion
% and leads to the expectation that the NaNs are NOT in the same places
% with some potential.
k = isfinite(X);
if all(k(:))
  % in this case there were no nans, so reshape the arrays into vectors
  % While X(k) would convert an array into a column vector anyway in
  % this case, it seems far more sane to do the reshape explicitly,
  % rather than let X(k) do it implicitly. Again, a source of ambiguity
  % removed.
  X = X(:);
  Y = Y(:);
  Z = Z(:);
else
  X = X(k);
  Y = Y(k);
  Z = Z(k);
end

% Combine X,Y,Z into one array
XYZ = [X,Y,Z];

% column means. This will allow us to do the fit properly with no
% constant term needed. It is necessary to do it this way, as
% otherwise the fit would not be properly scale independent
cm = mean(XYZ,1);

% subtract off the column means
XYZ0 = bsxfun(@minus,XYZ,cm);

% The "regression" as a planar fit is now accomplished by SVD.
% This presumes errors in all three variables. In fact, it makes
% presumptions that the noise variance is the same for all the
% variables. Be very careful, as this fact is built into the model.
% If your goal is merely to fit z(x,y), where x and y were known
% and only z had errors in it, then this is the wrong way
% to do the fit.
[U,S,V] = svd(XYZ0,0);

% The singular values are ordered in decreasing order for svd.
% The vector corresponding to the zero singular value tells us
% the direction of the normal vector to the plane. Note that if
% your data actually fell on a straight line, this will be a problem
% as then there are two vectors normal to your data, so no plane fit.
% LOOK at the values on the diagonal of S. If the last one is NOT
% essentially small compared to the others, then you have a problem
% here. (If it is numerically zero, then the points fell exactly in
% a plane, with no noise.)
diag(S)

% Assuming that S(3,3) was small compared to S(1,1), AND that S(2,2)
% is significantly larger than S(3,3), then we are ok to proceed.
% See that if the second and third singular values are roughly equal,
% this would indicate a points on a line, not a plane.
% You do need to check these numbers, as they will be indicative of a
% potential problem.
% Finally, the magnitude of S(3,3) would be a measure of the noise
% in your regression. It is a measure of the deviations from your
% fitted plane.

% The normal vector is given by the third singular vector, so the
% third (well, last in general) column of V. I'll call the normal
% vector P to be consistent with the question notation.
P = V(:,3);

% The equation of the plane for ANY point on the plane [x,y,z]
% is given by
%
%    dot(P,[x,y,z] - cm) == 0
%
% Essentially this means we subtract off the column mean from our
% point, and then take the dot product with the normal vector. That
% must yield zero for a point on the plane. We can also think of it
% in a different way, that if a point were to lie OFF the plane,
% then this dot product would see some projection along the normal
% vector.
%
% So if your goal is now to predict Z, as a function of X and Y,
% we simply expand that dot product.
% 
%   dot(P,[x,y,z]) - dot(P,cm) = 0
%
%   P(1)*X + P(2)*Y + P(3)*Z - dot(P,cm) = 0
%
% or simply (assuming P(3), the coefficient of Z) is not zero...

Zhat = (dot(P,cm) - P(1)*X) - P(2)*Y)/P(3);

Be careful, in that if the coefficient of Z was small, then the resulting prediction will be very sensitive to small perturbations in that coefficient. Essentially it would means that the plane was a vertical plane, or nearly so.

Again, all of this assumes that your goal was truly an errors in variables computation, where we had noise in all three variables X,Y,Z. If the noise was only in Z, then the proper solution is far simpler.

@Schorsch For +5 you can award a bounty. – Evgeni Sergeev May 29 '14 at 08:00 — Evgeni Sergeev, May 29 '14 at 08:00

Best fit plane for 3D data

1 Answers1

Linked