1

I have a matrix of shape 256x256 to which I'm trying to find a line of best fit. This is an image by the way, so these are simply intensity values. Let's assume I want to find the line of best fit through all the intensities, how would I go about doing that? This link describes how to do so on a 3d dataset, using svd. However I'm a bit confused as to how that would be applied to my numpy array?

EDIT: Here's an example with random values of doubles that I've profiled with %timeit:

ran = [25,50,75,100,125]
for i in ran:                                                                                 
    J = np.random.random((i,i))                                                     
    y,x=np.indices(J.shape)                                                                   
    x = x.ravel()                                                                             
    y = y.ravel()                                                                             
    J=J.ravel()                                                                               
    data = np.concatenate((x[:, np.newaxis],
                          y[:, np.newaxis],
                          J[:, np.newaxis]),axis=1)        
    datamean = data.mean(axis=0)                                                              
    print "Doing %d now"  %i                                                                    
    %timeit U, S, V = np.linalg.svd(data - datamean) 

I get the following output:

Doing 25 now
100 loops, best of 3: 10.4 ms per loop
Doing 50 now
1 loops, best of 3: 285 ms per loop
Doing 75 now
1 loops, best of 3: 3 s per loop
Doing 100 now
1 loops, best of 3: 5.83 s per loop
Doing 125 now
1 loops, best of 3: 15.1 s per loop

EDIT2: Here's my actual array. I just saved it in numpy's npy format

Community
  • 1
  • 1
faskiat
  • 689
  • 2
  • 6
  • 21

2 Answers2

3

The answer you pointed out is directly applicable to your problem by doing:

import numpy as np

z = your_matrix_256_x_256
y, x = np.indices(z.shape)
x = x.ravel()
y = y.ravel()
z = z.ravel()

Note that the intervals for x and y can be reajusted multiplying these arrays by proper scalars.


EDIT:

having a look to your data it seems more that your problem is a 2-D curve fit, which can be done with np.polyfit(), as shown in the example below.

z = np.load('arr.npy').astype(np.int32)
y, x = np.indices(z.shape)
valid_z = (y.ravel()>0) & (z.ravel()>0)
x_valid = x.ravel()[valid_z]
y_valid = y.ravel()[valid_z]
z_valid = z.ravel()[valid_z]
# fitting best curve
fig = plt.figure(figsize=(5,3))
ax = fig.add_subplot(111)
ax.scatter(x_valid, y_valid, c=z_valid, alpha=0.2, s=20, edgecolor='none',
        cmap=plt.cm.jet)
# finding best-fit curve
z = np.polyfit(x_valid, y_valid, w=z_valid**0.5, deg=1)
p = np.poly1d(z)
# plotting
x_plot = np.linspace(x_valid.min(), x_valid.max(), 100)
y_plot = p(x_plot)
ax.plot(x_plot, y_plot, '-r', lw=2)
ax.set_xlim(0, x.shape[1])
ax.set_ylim(0, y.shape[0])

ax.legend(loc='lower left', frameon=False, fontsize=8)
fig.savefig('test.png', bbox_inches='tight')

which gives:

enter image description here

Community
  • 1
  • 1
Saullo G. P. Castro
  • 56,802
  • 26
  • 179
  • 234
  • Wow this routine takes a very long time to run. It's been going for 2 hours now, is that normal? I tried it on a joint histogram image, so a 256x256 np array with dtype float64. – faskiat Sep 28 '14 at 19:08
  • @benbran I never tried the routine of the referred answer... can you make available your matrix somehow such that I can test with a new routine? – Saullo G. P. Castro Sep 28 '14 at 19:29
  • Sorry I didn't read you wanted the actual matrix. I made it available via a Dropbox link – faskiat Sep 28 '14 at 19:42
  • @benbran I created [this Gist on GitHub](https://gist.github.com/saullocastro/61c1b17bc3bec3ba3561#file-fit_3d_line-py) with a solution I find promissing... could you try it out? – Saullo G. P. Castro Sep 28 '14 at 21:18
  • I ran it as a script and it ran pretty quick, 2.921s but it gave a warning: RuntimeWarning: Number of calls to function has reached maxfev = 1400. – faskiat Sep 28 '14 at 23:11
  • @benbran I've edited the answer with what I believe is the right approach for your problem... – Saullo G. P. Castro Sep 29 '14 at 07:04
2

You can get considerably faster results by avoiding replication of your data and adjusting your weights to take into account replicated observations. I also have ignored the presumably bad data clustered on the bottom edge of the data.

import numpy
import matplotlib.pyplot as plt

# Load the data
arr = numpy.load('arr.npy')

# Extract the counts from the data
data = numpy.array([(j, i, n) for (i, j), n in numpy.ndenumerate(arr)
                              if n != 0.0 and i > 0])
x, y, n = data.T
weight = numpy.sqrt(n)

# Fit the data, weighting appropriately for the number of points
p = numpy.polyfit(x, y, 1, w=weight)

# Evaluate the polynomial - this is here to make it easier to use
# other orders
smoothx = numpy.linspace(x.min(), x.max(), 100)
smoothy = numpy.polyval(p, smoothx)

# Plot the data and the fit
plt.pcolor(numpy.log(arr+1))
plt.plot(smoothx, smoothy, color='red', linewidth=2)
plt.xlim([0, 255])
plt.ylim([0, 255])
plt.show()

Fit result

Community
  • 1
  • 1
chthonicdaemon
  • 19,180
  • 2
  • 52
  • 66