Python rpy2 - nls regression RRuntimeError

Question

I am trying to do some nls regression using R within Python. I am getting stuck with a RRuntimeError and am getting to a point where I am way outside my expertise and have struggled for a few days to get it to work so would appreciate some help.

This is my csv of data: http://www.sharecsv.com/s/4cdd4f832b606d6616260f9dc0eedf38/ratedata.csv

This is my code:

import pandas as pd
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
pandas2ri.activate()

dfData = pd.read_csv('C:\\Users\\nick\\Desktop\\ratedata.csv')
rdf = pandas2ri.py2ri(dfData)

a = 0.5
b = 1.1
count = rdf.rx(True, 'Trials')
rates = rdf.rx(True, 'Successes')

base = importr('base', robject_translations={'with': '_with'})
stats = importr('stats', robject_translations={'format_perc': '_format_perc'})

my_formula = stats.as_formula('rates ~ 1-(1/(10^(a * count ^ (b-1))))')

d = ro.ListVector({'a': a, 'b': b})

fit = stats.nls(my_formula, weights=count, start=d)

Everything is compiling apart from:

fit = stats.nls(my_formula, weights=count, start=d)

I am getting the following traceback:

---------------------------------------------------------------------------
RRuntimeError                             Traceback (most recent call last)
<ipython-input-12-3f7fcd7d7851> in <module>()
      6 d = ro.ListVector({'a': a, 'b': b})
      7 
----> 8 fit = stats.nls(my_formula, weights=count, start=d)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2\robjects\functions.py in __call__(self, *args, **kwargs)
    176                 v = kwargs.pop(k)
    177                 kwargs[r_k] = v
--> 178         return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
    179 
    180 pattern_link = re.compile(r'\\link\{(.+?)\}')

~\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2\robjects\functions.py in __call__(self, *args, **kwargs)
    104         for k, v in kwargs.items():
    105             new_kwargs[k] = conversion.py2ri(v)
--> 106         res = super(Function, self).__call__(*new_args, **new_kwargs)
    107         res = conversion.ri2ro(res)
    108         return res

RRuntimeError: Error in (function (formula, data = parent.frame(), start, control = nls.control(),  : 
  parameters without starting value in 'data': rates, count

I would be eternally thankful if anyone can see where I am going wrong, or can offer advice. All I want is the two numbers from that formula back in Python so I can use those to construct some confidence intervals.

Thank you

Very important: your `a` is posted here at 0.5 but in [previous post](https://stackoverflow.com/questions/48442703/rpy2-r-object-has-no-attribute-nls) in original R script was 0.05. The larger value will cause a gradient error in `nls()`. — Parfait, Jan 26 '18 at 16:32

Parfait · Accepted Answer · 2018-01-26T19:58:39.150

Consider incorporating all your formula variables into a single dataframe and use the data argument. The as_formula call looks in the R environment but rates and count are in the Python scope. Hence, contain all items in same object. Then run your nls with either the Pandas dataframe or R dataframe:

import pandas as pd
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri

base = importr('base', robject_translations={'with': '_with'})
stats = importr('stats', robject_translations={'format_perc': '_format_perc'})

a = 0.05
b = 1.1
d = ro.ListVector({'a': a, 'b': b})

dfData = pd.read_csv('Input.csv')
dfData['count'] = dfData['Trials'].astype('float')
dfData['rates'] = dfData['Successes'] / dfData['Trials']
dfData['a'] = a
dfData['b'] = b

pandas2ri.activate()

rdf = pandas2ri.py2ri(dfData)

my_formula = stats.as_formula('rates ~ 1-(1/(10^(a * count ^ (b-1))))')

# WITH PANDAS DATAFRAME
fit = stats.nls(formula=my_formula, data=dfData, weights=dfData['count'], start=d)
print(fit)

# WITH R DATAFRAME
fit = stats.nls(formula=my_formula, data=rdf, weights=rdf.rx(True, 'count'), start=d)
print(fit)

Alternatively, you can use robjects.globalenv and not use data argument:

ro.globalenv['rates'] = dfData['rates']
ro.globalenv['count'] = dfData['count']
ro.globalenv['a'] = dfData['a']
ro.globalenv['b'] = dfData['b']

fit = stats.nls(formula=my_formula, weights=dfData['count'], start=d)
print(fit)

# Nonlinear regression model    
#   model: rates ~ 1 - (1/(10^(a * count^(b - 1))))    
#    data: parent.frame()

#       a       b     
# 0.01043 1.24943     
#  weighted residual sum-of-squares: 14.37       

# Number of iterations to convergence: 6     
# Achieved convergence tolerance: 9.793e-07

# To return parameters    
num = fit.rx('m')[0].names.index('getPars')
obj = fit.rx('m')[0][num]()

print(obj[0])
# 0.010425686223717435

print(obj[1])
# 1.2494303314553932

Equivalently in R:

dfData <- read.csv('Input.csv')

a <- .05
b <- 1.1  
d <- list(a=a, b=b)

dfData$count <- dfData$Trials
dfData$rates <- dfData$Successes / dfData$Trials
dfData$a <- a
dfData$b <- b

my_formula <- stats::as.formula("rates ~ 1-(1/(10^(a * count ^ (b-1))))")

fit <- stats::nls(my_formula, data=dfData, weights=dfData$count, start=d)
print(fit)

# Nonlinear regression model
#   model: rates ~ 1 - (1/(10^(a * count^(b - 1))))
#    data: dfData
#       a       b 
# 0.01043 1.24943 
#  weighted residual sum-of-squares: 14.37

# Number of iterations to convergence: 6 
# Achieved convergence tolerance: 9.793e-07

# To return parameters  
fit$m$getPars()['a']
# 0.01042569 

fit$m$getPars()['b']
# 1.24943

Parfait, thank you ever so much for the time you have taken to help and for the multiple solutions. It works perfectly and I now understand where I was going wrong. I really cant thank you enough! I hope anyone else who comes across this post is helped also. Have a great weekend!!!!! — Nicholas, Jan 26 '18 at 17:05
Oh there is one last thing, and if you have a Paypal, I will gladly buy you a pint! (message me!)... is there an easy way of grabbing those coefficients (a 0.01043 b 1.24943) and putting them in a variable? I was doing it in a weird way (putting fit in a string and then extracting the string between other strings lol, which isnt very elegant! lol) — Nicholas, Jan 26 '18 at 17:53
Glad to help! If you read the R docs `?nls` you will see the return value is a list of items. Under `m`, there is a function `getPars()` to return those needed estimates. In R it is a straightforward named vector: `fit$m$getPars()`. In Python: `fit.rx('m')[0][11]()`. See updates above. — Parfait, Jan 26 '18 at 20:00
Wow. Awesome! Works perfectly!... so much better than my awful string splicing technique using base.summary(fit)!.... thank you SO much! — Nicholas, Jan 26 '18 at 20:24
Hey, just going back through the code, and noticed that I assign something to 'base' but then never use it. Why do I have this code in there? :)... base = importr('base', robject_translations={'with': '_with'}) — Nicholas, Feb 14 '18 at 19:05
Yes, no need for *base* here. It might have stayed on during revisions of this and previous question. Keep if needing to use any base R's functions as original [R code does](https://stackoverflow.com/q/48442703/1422451). — Parfait, Feb 14 '18 at 19:15

Python rpy2 - nls regression RRuntimeError

1 Answers1