0

I am just wondering, how would I go about fitting a line to histogram, using the z-counts as weights? An example of this is shown below (although this post just discusses overlaying multiple plots), taken from Scatter plot with density in Matlab).

My initial thought is to make an array consisting of each pixel from the density plot, repeated n times to make a scatter plot (n == the number of counts), then do a linear polyfit. This seems awfully redundant though.

enter image description here

Community
  • 1
  • 1
Bob Dole
  • 59
  • 1
  • 2
  • 10
  • 1
    You would stand a better chance of getting an answer if people can play around with some of your code. You should post a [How to create a Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve). – jub0bs Dec 06 '14 at 21:45
  • Sure, I could include code. It is not coding I am having trouble with, please read the question carefully prior to commenting. It's a concept. I said I know how to do what I want, I just want a better way to do it, i.e. a specific function. – Bob Dole Dec 06 '14 at 22:14
  • 1
    In defense of Jubobs, StackOverflow bills itself as being for questions about code (http://stackoverflow.com/tour) So, questions with no code raise more questions. – chipaudette Dec 06 '14 at 22:48
  • Well then I must self implode this questions as it is moo! I have flagged it to be deleted. Will be sure to refrain from asking questions that are irrelevant. – Bob Dole Dec 06 '14 at 23:26
  • Hmm, it appears most questions do not have code in them. Perhaps (as indicated here: http://stackoverflow.com/help/on-topic), questions about code and code toolboxes are indeed on-topic. – Bob Dole Dec 06 '14 at 23:38
  • This is a common point of discussion/argument/disagreement on StackOverflow. There's no need to take it personally. I found your question interesting, which why I answered it (twice!). I think it should stay. – chipaudette Dec 07 '14 at 13:15

2 Answers2

2

The other approach is to do a weighted least squares solution. You need the (x,y) location of each pixel and the number of counts n within each pixel. Then, I think that you'd do the weighted least-squares this way:

%gather your known data...have x,y, and n all in the same order as each other
A = [x(:) ones(length(x),1)];  %here are the x values from your histogram
b = y(:);  %here are the y-values from your histogram
C = diag(n(:));  %counts from each pixel in your 2D histogram

%Define polynomial coefficients as p = [slope; y_offset]; 

%usual least-squares solution...written here for reference
% b = A*p;               %remember, p = [slope; y_offset];
% p = inv(A'*A)*(A'*b);  %remember, p = [slope; y_offset];

%We want to apply a weighting matrix, so incorporate the weighting matrix
% A' * b = A' * C * A * p;  
p = inv(A' * C * A)*(A' * b);  %remember, p = [slope; y_offset];

The biggest uncertainty for me with this solution is whether the C matrix should be made up of n or n.^2, I can never remember. Hopefully, someone can correct me in the comments, if needed.

chipaudette
  • 1,655
  • 10
  • 13
  • Makes sense, I suppose I was hoping for a simple MATLAB function, but this method is easy to implement and understand. That or I could just use the method I described above. Thanks for your helpful responses. – Bob Dole Dec 06 '14 at 22:13
  • If you give this a try, be sure to not include pixels with an n=0. If you do, I'm pretty sure that the zeros in the resulting `C` matrix will make `A' * C * A` not invertable and the solution will fail. – chipaudette Dec 07 '14 at 13:12
  • It take that back. I think that even if there are at least 2 non-zeros in the diagonal of `C`, `A'*C*A` will still end up as a sufficiently full 2x2 matrix to do the inversion. – chipaudette Dec 07 '14 at 13:21
0

If you have the original data, which is a collection of (x,y) points, you simply do a polyfit on the original data:

p = polyfit(x(:),y(:),1);  %linear fit

That will give you a best fit (in the least-squares sense) to the original data, which is what you want.

If you do not have the original data, and you only have the 2D histogram, the approach that you defined (which basically recreates a facsimile of the original data) will give a similar answer as if you did the polyfit on the original data.

chipaudette
  • 1,655
  • 10
  • 13