Fitting The Theoretical Equation To My Data

Question

I am very, very new to python, so please bear with me, and pardon my naivety. I am using Spyder Python 2.7 on my Windows laptop. As the title suggests, I have some data, a theoretical equation, and I am attempting to fit my data, with what I believe is the Chi-squared fit. The theoretical equation I am using is enter image description here

import math

import numpy as np

import scipy.optimize as optimize

import matplotlib.pylab as plt

import csv

#with open('1.csv', 'r') as datafile:
 #   datareader = csv.reader(datafile)
 #   for row in datareader:
  #      print ', '.join(row)

t_y_data = np.loadtxt('exerciseball.csv', dtype=float, delimiter=',', usecols=(1,4), skiprows = 1)


print(t_y_data)

t = t_y_data[:,0]

y = t_y_data[:,1]

gamma0 = [.1]

sigma = [(0.345366)/2]*(len(t))

#len(sigma)

#print(sigma)

#print(len(sigma))

#sigma is the error in our measurements, which is the radius of the object


# Dragfunction is the theoretical equation of the position as a function of time when the thing falling experiences a drag force
# This is the function we are trying to fit to our data
# t is the independent variable time, m is the mass, and D is the Diameter

#Gamma is the value of which python will vary, until chi-squared is a minimum



def Dragfunction(x, gamma):
    print x
    g = 9.8
    D = 0.345366
    m = 0.715
#    num = math.sqrt(gamma)*D*g*x
#    den = math.sqrt(m*g)
#    frac = num/den
#    print "frac", frac

    return ((m)/(gamma*D**2))*math.log(math.cosh(math.sqrt(gamma/m*g)*D*g*t))


optimize.curve_fit(Dragfunction, t, y, gamma0, sigma)

This is the error message I am getting:

return ((m)/(gamma*D**2))*math.log(math.cosh(math.sqrt(gamma/m*g)*D*g*t))
TypeError: only length-1 arrays can be converted to Python scalars

My professor and I have spent about three or four hours trying to fix this. He helped me work out a lot of the problems, but this we can't seem to resolve.

Could someone please help? If there is any other information you need, please let me know.

I think you can solve this simply by using `np.log` and `np.sqrt` and `np.cosh`, which act on arrays instead of just scalars. — askewchan, Nov 23 '13 at 01:49
Also, unrelated to your question but, `gamma/m*g` is the same as `gamma*g/m` since multiplication and division are just done left to right in python. You want `gamma/(m*g)` according to your equation at top. — askewchan, Nov 23 '13 at 01:53
askewchan, so, when you say that those functions act on arrays, do you mean that it loops the function through each value in the array? — Mack, Nov 23 '13 at 13:00
Yes, functions on numpy arrays (`ndarray`) act on each element individually. See the example at the beginning of my answer for `np.sqrt`. Other functions act on the array as a whole, like `np.sum`, but usually it's clear which will happen from the context. — askewchan, Nov 23 '13 at 17:02

askewchan · Accepted Answer · 2013-11-23T04:27:54.887

2

Your error message comes from the fact that those math functions only accept a scalar, so to call functions on an array, use the numpy versions:

In [82]: a = np.array([1,2,3])

In [83]: np.sqrt(a)
Out[83]: array([ 1.        ,  1.41421356,  1.73205081])

In [84]: math.sqrt(a)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
----> 1 math.sqrt(a)

TypeError: only length-1 arrays can be converted to Python scalars

In the process, I happened to spot a mathematical error in your code. Your equation at top says that g is in the bottom of the square root inside the log(cosh()), but you've got it on the top because a/b*c == a*c/b in python, not a/(b*c)

log(cosh(sqrt(gamma/m*g)*D*g*t))

should instead be any one of these:

log(cosh(sqrt(gamma/m/g)*D*g*t))
log(cosh(sqrt(gamma/(m*g))*D*g*t))
log(cosh(sqrt(gamma*g/m)*D*t))     # the simplest, by canceling with the g from outside sqrt

A second error is that in your function definition, you have the parameter named x which you never use, but instead you're using t which at this point is a global variable (from your data), so you won't see an error. You won't see an effect using curve_fit since it will pass your t data to the function anyway, but if you tried to call the Dragfunction on a different data set, it would still give you the results from the t values. Probably you meant this:

def Dragfunction(t, gamma):
    print t
    ...
    return ... D*g*t ...

A couple other notes as unsolicited advice, since you said you were new to python:

You can load and "unpack" the t and y variables at once with:

t, y = np.loadtxt('exerciseball.csv', dtype=float, delimiter=',', usecols=(1,4), skiprows = 1, unpack=True)

If your error is constant, then sigma has no effect on curve_fit, as it only affects the relative weighting for the fit, so you really don't need it at all.

Below is my version of your code, with all of the above changes in place.

import numpy as np
from scipy import optimize         # simplified syntax
import matplotlib.pyplot as plt    # pylab != pyplot

# `unpack` lets you split the columns immediately:
t, y = np.loadtxt('exerciseball.csv', dtype=float, delimiter=',',
                  usecols=(1, 4), skiprows=1, unpack=True)

gamma0 = .1 # does not need to be a list

def Dragfunction(x, gamma):
    g = 9.8
    D = 0.345366
    m = 0.715
    gammaD_m = gamma*D*D/m # combination is used twice, only calculate once for (small) speedup
    return np.log(np.cosh(np.sqrt(gammaD_m*g)*t)) / gammaD_m

gamma_best, gamma_var = optimize.curve_fit(Dragfunction, t, y, gamma0)

edited Nov 23 '13 at 04:27

answered Nov 23 '13 at 02:11

askewchan

45,161
17
118
134

So, when I run this code, I get [ 2.17284808] and [[ 3.63752744]]. I presume the first array is the value of gamma, and the second one is the error bar on it? – Mack Nov 23 '13 at 13:50
More or less, yes. The first is the best fit value for `gamma`, but the second is the variance (not really the error). I believe one estimate of error would be `sqrt(variance/t.size)`. – askewchan Nov 23 '13 at 17:03
I am actually having one problem with this code. The smaller I make gamma0, the smaller the fitted gamma value is. Eventually, when gamma0 = .0000000000000001, I get a [ 1.00000000e-16] inf as an output. Something seems amiss. I know gamma we are trying to find should be around 0.025 – Mack Nov 24 '13 at 22:11
The result can certainly depend on your first guess; there may be many local minima, so you won't necessarily get the one right answer. In fact, the fake data I was using to test this code was `y = cos(w*t) + noise` and I was fitting to `w`, and if I used `w0` far from the original value, it would fit to almost exactly `w/2`. If it is at all practical, your best bet is to try to plot the data, theory with `gamma0`, and theory with `gamma_fit` for several values of `gamma0` and see which are reasonable. – askewchan Nov 24 '13 at 22:27
And I just noticed another thing, when gamma0 becomes small enough, the gamma_best just ends up equaling gamma0. For instance, I set gamma0 = .0000000000000000000000000000000000000001 (yes, there 40 zeros), and what was returned was gamma_best = 1.00000000e-40 – Mack Nov 24 '13 at 22:27
I don't think there is much to be gained by pushing `gamma0` to `1e-40`, in fact it can be quite dangerous. I wouldn't trust any numerical results with values so small. If you know that `gamma > 0` but nothing about its magnitude, an appropriate guess would be `gamma0 = 1.0`. If you actually expect `gamma` to be much smaller than one, you should rewrite your equation to normalize it better, since these types of applications work best for numbers near 1. – askewchan Nov 24 '13 at 22:31
I don't know what you mean by, "rewrite your equation to normalize it better." – Mack Nov 24 '13 at 22:37
Sorry, I mean something like transform/rescale your variables so that their units are normalized to unity (`1`). I don't know what your data looks like, but for example, say you were measuring the mass and velocity of a tiny particle, all your numbers would be very tiny if measured in kg and m/s. If you transformed your data so that mass is measured in micrograms, and nanometers/sec, e.g., then the mass and velocity values would be near 1, and much more reliable in computation. You can rescale back to your original values after the optimization computation is done. – askewchan Nov 24 '13 at 22:41
There is some mention of it, and some other helpful advice here: http://stackoverflow.com/a/19544703/1730674 – askewchan Nov 24 '13 at 23:12

Fitting The Theoretical Equation To My Data

1 Answers1