9

I have sets of x,y,z points in 3D space and another variable called charge which represents the amount of charge that was deposited in a specific x,y,z coordinate. I would like to do a weighted (weighted by the amount of charge deposited in the detector, which just corresponds to a higher weight for more charge) for this data such that it passes through a given point, the vertex.

Now, when I did this for 2D, I tried all sorts of methods (bringing the vertex to the origin and doing the same transformation for all the other points and forcing the fit to go through the origin, giving the vertex really high weight) but none of them were as good as the answer given here by Jaime: How to do a polynomial fit with fixed points

It uses the method of Lagrange multipliers, which I'm vaguely familiar with from an undergraduate Advanced Multi variable course, but not much else and it doesn't seem like the transformation of that code will be as easy as just adding a z coordinate. (Note that even though the code doesn't take into consideration the amount of charge deposited, it still gave me the best results). I was wondering if there was a version of the same algorithm out there, but in 3D. I also contacted the author of the answer in Gmail, but didn't hear back from him.

Here is some more information about my data and what I'm trying to do in 2D: How to weigh the points in a scatter plot for a fit?

Here is my code for doing this in a way where I force the vertex to be at the origin and then fit the data setting fit_intercept=False. I'm currently pursuing this method for 2D data since I'm not sure if there's a 3D version out there for Lagrange multipliers, but there are linear regression ways of doing this in 3D, for instance, here: Fitting a line in 3D:

import numpy as np
import sklearn.linear_model

def plot_best_fit(image_array, vertexX, vertexY):
    weights = np.array(image_array)
    x = np.where(weights>0)[1]
    y = np.where(weights>0)[0]
    size = len(image_array) * len(image_array[0])
    y = np.zeros((len(image_array), len(image_array[0])))
    for i in range(len(np.where(weights>0)[0])):
        y[np.where(weights>0)[0][i]][np.where(weights>0)[1][i]] = np.where(weights>0)[0][i]
    y = y.reshape(size)
    x = np.array(range(len(image_array)) * len(image_array[0]))
    weights = weights.reshape((size))
    for i in range(len(x)):
        x[i] -= vertexX
        y[i] -= vertexY
    model = sklearn.linear_model.LinearRegression(fit_intercept=False)
    model.fit(x.reshape((-1, 1)),y,sample_weight=weights)
    line_x = np.linspace(0, 512, 100).reshape((-1,1))
    pred = model.predict(line_x)
    m, b = np.polyfit(np.linspace(0, 512, 100), np.array(pred), 1)
    angle = math.atan(m) * 180/math.pi
    return line_x, pred, angle, b, m

image_array is a numpy array and vertexX and vertexY are the x and y coordinates of the vertex, respectively. Here's my data: https://uploadfiles.io/bbhxo. I cannot create a toy data as there is not a simple way of replicating this data, it was produced by Geant4 simulation of a neutrino interacting with an argon nucleus. I don't want to get rid of the complexity of the data. And this specific event happens to be the one for which my code does not work, I'm not sure if I can generate a data specifically so my code doesn't work on it.

  • Are you trying to provide more weight to points with more charge? Or are you trying to force the fitted line through a few key points? These are two different problems. The first problem already has an answer in your previous question. Lagrange multipliers will help with the second problem (i.e. fitting a curve subject to constraints). – Andrew Guy Aug 13 '18 at 00:48
  • Ideally both (in my case, there's just one point that I have to fit it through). But as I said in the question, if I could get a Lagrange multiplier for 3D (since it worked the best for 2D) without the weights, that'd be enough. – Always Learning Forever Aug 13 '18 at 01:19
  • So you have i) a single point that has to be on the line of best fit, and ii) want to apply weights to all other points? I would re-centre your data around your constraint point, and then just fit a polynomial regression using Scikit-learn with appropriate weights, setting `fit_intercept=False`. – Andrew Guy Aug 13 '18 at 01:49
  • When you say recenter my data around constraint point, do you mean bring the constraint point to the origin and change all the other points by the same amount? – Always Learning Forever Aug 13 '18 at 02:41
  • Also, no, that doesn't work for my data – Always Learning Forever Aug 13 '18 at 03:21
  • Here's one of my data, if you're interested: https://ufile.io/6ljp1. The first column is are the x axis points, 2nd y axis, 3rd the weights. There are some texts at the top and bottom so you may have to edit them out – Always Learning Forever Aug 13 '18 at 03:29
  • Yes, I mean subtract the coordinates of the constraint point from each data point in the dataset. Why doesn't this work for your data? – Andrew Guy Aug 13 '18 at 03:32
  • That's a very good question, I'm not sure. You can take a look at the data yourself, if you want. The coordinate of the vertex is: 129 255. Also, this is just in 2D for now, once I've figured out a decent way of doing this, I'll try that approach in 3D – Always Learning Forever Aug 13 '18 at 03:34
  • The method of Lagrange Multipliers *does* work here, but I don't have a counterpart of it in 3D yet, and I wanna make sure that the method I'm using is the same in 2D and in 3D. – Always Learning Forever Aug 13 '18 at 03:40
  • If you have working code for the 2D case, please post a minimal, complete, reproducible example here. There might be a simple change to extend to the 3D case. Otherwise you're asking someone to do all the work for you. – Andrew Guy Aug 13 '18 at 03:44
  • Sure, here is my numpy array (the vertex is included in the top): https://ufile.io/bbhxo. Here's my code: https://textuploader.com/d7ssa. You pass the numpy array as the first argument image_array and vertexX, vertexY are the x,y. coordinates of the vertex. This example suffers from the problem that it doesn't fit the data properly. The one that works the best, again, is the Lagrange Multiplier one whose code is given in the question. It will take me a while to make my current code more independent so it can be used, but I can do that, if you need it – Always Learning Forever Aug 13 '18 at 03:54
  • Please don't link to external sites for code/data files. I have no way of knowing if they are trustworthy. You should edit your question to include the relevant code and some toy data. Also, make sure it is a MCVE - https://stackoverflow.com/help/mcve If you do that, you dramatically increase the chances of someone solving your problem. – Andrew Guy Aug 13 '18 at 04:25
  • I can include my code here, but I don't think I can mimic my data. My code works fine for most of my data, but it doesn't for certain samples. This is one of those samples that do not work – Always Learning Forever Aug 13 '18 at 04:31
  • Just let me ask again, the only thing you want is a line that passes through a fixed point and best fits a set of data, where the data points have different weights? – mikuszefski Aug 16 '18 at 15:17
  • @mikuszefski That's correct – Always Learning Forever Aug 16 '18 at 17:09

1 Answers1

6

This is more a hand made solution using basic optimization. It is straight forward. One just measures the distance of a point to the line to be fitted and minimizes the weighted distances using basic optimize.leastsq. Code is as follows:

# -*- coding: utf-8 -*
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.cm as cm
from scipy import optimize
import numpy as np

def rnd( a ):
    return  a * ( 1 - 2 * np.random.random() ) 

def affine_line( s, theta, phi, x0, y0, z0 ):
    a = np.sin( theta) * np.cos( phi )
    b = np.sin( theta) * np.sin( phi )
    c = np.cos( theta )
    return np.array( [ s * a + x0, s * b + y0, s * c + z0 ] )

def point_to_line_distance( x , y, z , theta, phi, x0, y0, z0 ):
    xx = x - x0
    yy = y - y0
    zz = z - z0
    a = np.sin( theta) * np.cos( phi )
    b = np.sin( theta) * np.sin( phi )
    c = np.cos( theta )
    r = np.array( [ xx, yy, zz ] )
    t = np.array( [ a, b, c ] )
    return np.linalg.norm( r - np.dot( r, t) * t )

def residuals( parameters, fixpoint, data, weights=None ):
    theta, phi = parameters
    x0, y0, z0 = fixpoint
    if weights is None:
        w = np.ones( len( data ) )
    else:
        w = np.array( weights )
    diff = np.array( [ point_to_line_distance( x , y, z , theta, phi , *fixpoint ) for x, y, z in data ] )
    diff = diff * w
    return diff

### some test data
fixpoint = [ 1, 2 , -.3 ]
trueline = np.array( [ affine_line( s, .7, 1.7, *fixpoint ) for s in np.linspace( -1, 2, 50 ) ] )
rndData = np.array( [ np.array( [ a + rnd( .6), b + rnd( .35 ), c + rnd( .45 ) ] ) for a, b, c in trueline ] )
zData = [ 20 * point_to_line_distance( x , y, z , .7, 1.7, *fixpoint ) for x, y, z in rndData ]

### unweighted
bestFitValuesUW, ier= optimize.leastsq(residuals, [ 0, 0],args=( fixpoint, rndData ) )
print bestFitValuesUW
uwLine = np.array( [ affine_line( s, bestFitValuesUW[0], bestFitValuesUW[1], *fixpoint ) for s in np.linspace( -2, 2, 50 ) ] )

### weighted ( chose inverse distance as weight....would be charge in OP's case )
bestFitValuesW, ier= optimize.leastsq(residuals, [ 0, 0],args=( fixpoint, rndData, [ 1./s for s in zData ] ) )
print bestFitValuesW
wLine = np.array( [ affine_line( s, bestFitValuesW[0], bestFitValuesW[1], *fixpoint ) for s in np.linspace( -2, 2, 50 ) ] )

### plotting
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1, projection='3d' )
ax.plot( *np.transpose(trueline ) ) 
ax.scatter( *fixpoint, color='k' )
ax.scatter( rndData[::,0], rndData[::,1], rndData[::,2] , c=zData, cmap=cm.jet )

ax.plot( *np.transpose( uwLine ) ) 
ax.plot( *np.transpose( wLine ) ) 

ax.set_xlim( [ 0, 2.5 ] )
ax.set_ylim( [ 1, 3.5 ] )
ax.set_zlim( [ -1.25, 1.25 ] )

plt.show()

which returns

>> [-0.68236386 -1.3057938 ]
>> [-0.70928735 -1.4617517 ]

results

The fix point is shown in black. The original line in blue. The unweighted and weighted fit are in orange and green, respectively. Data is colored according to distance to line.

mikuszefski
  • 3,943
  • 1
  • 25
  • 38
  • Quick question: what are the two values that are returned exactly? [-0.68236386 -1.3057938 ], are they theta and phi? – Always Learning Forever Sep 12 '18 at 06:52
  • Also what do you mean by "original line"? – Always Learning Forever Sep 12 '18 at 06:53
  • @AlwaysLearningForever Yes, the returned parameters are the best fit values for `theta` and `phi`. As this is generic data, I generated it by providing a given line adding random numbers. This is the 'original line' and the fit, hence, should be close to this line. – mikuszefski Sep 12 '18 at 07:11
  • @AlwaysLearningForever In this case, I / we have the advantage to know what the fit should look like. And it might help to play with it a little bit. One can see, e.g., that depending on how the random points distribute, the weighting has more or less impact. – mikuszefski Sep 12 '18 at 07:13