2

The following procedure results in values which do not always match the assigned ones:

from scipy.interpolate import splprep, splev, splrep
import numpy as np

pos2indx = lambda vec: vec.round().astype(np.int64)

t = np.linspace(1,3,150)
x = 150+100*np.sin(t) + 5*np.random.randn(len(t))
y = 150+100*np.cos(t) + 5*np.random.randn(len(t))
z = 150+100*np.cos(t)*np.sin(t) + 5*np.random.randn(len(t))

vector_field = np.zeros((x.max()+10,y.max()+10,z.max()+10,3), dtype=np.float64)

out = splprep([x,y,z],u=t,k=3, full_output=1, quiet=1)
tck, t = out[0]
v = np.transpose(splev(t,tck, der=1))
i,j,k = pos2indx(x),pos2indx(y),pos2indx(z)

vector_field[i,j,k,:] += v
print np.sum(np.abs(vector_field[i,j,k,:]-v))

I expected this procedure to always print zero. However, sometimes it does not! When the output is non-zero, it is of several thousands.

I am not sure whether I am doing something wrong or whether there's some bug here.

I reported this as a scipy issue as well.

solution:

Pauli Virtanen: "The bug is in your code: the i,j,k vectors can contain a given coordinate pair twice." ( https://github.com/scipy/scipy/issues/2581 )

jorgeca posted a similar answer below.

Thanks!

eldad-a
  • 3,051
  • 3
  • 22
  • 25

2 Answers2

2

To summarize your problem, adding v to an array of zeros and then substracting v doesn't always yield an array of zeros:

vector_field = np.zeros((x.max()+10,y.max()+10,z.max()+10,3), dtype=np.float64)
vector_field[i,j,k,:] += v
print np.sum(np.abs(vector_field[i,j,k,:]-v))  # sometimes not 0!!

That can't be right!

In fact, the indices i, j and k are numpy arrays, and so it's using fancy indexing. What should happen when a triplet of indices is repeated? That is, when the i[m] == i[n], j[m] == j[n] and k[m] == k[n] for some m, n. It turns out that by an implementation detail (that can change any time) only the last triplet (in C order) of indices will get assigned, and in the end vector_field[i,j,k,:] doesn't really contain v.

Community
  • 1
  • 1
jorgeca
  • 5,482
  • 3
  • 24
  • 36
  • You are correct, the repeated indices (due to the discretisation of the volume) is the reason. I came back this page to post the same answer I received on the SciPy issues post (answered by Pauli Virtanen ): "The bug is in your code: the i,j,k vectors can contain a given coordinate triplet more than one times." – eldad-a Jun 18 '13 at 17:48
  • 1
    Ah, since you had written that you had reported it I searched the list of issues in github before answering but somehow missed that it had already been closed... It's not a good idea to cross post so in general I'd first ask here and if it turns out to be a bug then report it. Glad you got it sorted out, though. – jorgeca Jun 18 '13 at 18:00
  • I accept your comment regarding the cross post. I was actually almost sure it was a bug that I was hesitating whether to post here at all... Thank you again for both comments! – eldad-a Jun 18 '13 at 18:52
1

The problem seems to be caused by round error.

I did a comparison running your code 1000 times using dtypes: float16, float32, float64 and float96; in vector_field and counted the number of times it gives a non zero answer:

float16: 1000
float32: 1000
float64: 142
float96: 115
Saullo G. P. Castro
  • 56,802
  • 26
  • 179
  • 234
  • If you store `v` (with dtype np.float64) in an array of a different dtype, you lose precision so when comparing it to the original `v` (which is still a np.float64) they won't be equal most of the time (ie, float64(float32(64_bit_number)) != 64_bit_number). Try changing the dtype of `v` too, and you'll get 100-150 differences for any dtype, so this isn't the actual reason. – jorgeca Jun 18 '13 at 17:15