1

I am using rpy2 to do some non-linear regression in r from python.

import rpy2.robjects as robjects
from rpy2.robjects import DataFrame, Formula
from rpy2.robjects import r
import rpy2.robjects.numpy2ri as npr
import numpy as np
from rpy2.robjects.packages import importr

r.nls(rates * 1-(1/(10^(a * count ^ (b-1)))), weights=count, start=list(a=a, b=b))

I have the following errors:

LookupError: 'nls' not found
AttributeError: 'R' object has no attribute 'nls'

It is also calling '~' as invalid syntax (I change it to * to get past it, but I do need it to be '~')

Any ideas on what is going wrong?

The code works fine in R.

This is the full code that works fine in R:

#This recipe assumes that the data is in a csv file called 'ratedata.csv' and that the values are in columns titled:
#Entity, Trials and Successes 
#Data must be sorted in order of number of applications (i.e. the 'Trials' column) highest to lowest.

data <- read.csv("ratedata.csv")                            #get the data
count <- data$Trials                                #define count as the number of trials
rates <- data$Successes / data$Trials                       #define rate as the success rate for each entity
a <- .05                                    #set initial values for a and b to generate predicted rates
b <- 1.1                                    #these values need to be reasonably sensible otherwise the later estimate will not converge sensibly
fit <- nls(rates ~ 1-(1/(10^(a * count ^ (b-1)))), weights=count, start=list(a=a, b=b))     #non-linear least squares fit of data, weighted by count (weighting is optional but helps if it won't converge sensibly)
summary(fit)                                #to show estimates of a and b
coef <- as.vector(coef(fit))                            #extract the coefficients into a vector for re-use
a <- coef[[1]]                              # extract the calculated coefficient for a
b <- coef[[2]]                              # extract the calculated coefficient for b
confidence <- confint(fit)
intervals <- as.vector(confidence[c(2,4)])
predopt <- 1-(1/(10^(a * count ^ (b-1))))                       #predict rate by count with optimised coefficients
se <- sqrt(( predopt* (1-predopt))/count)                       #calculate standard error for predicted rate
upper95 <- predopt + 2*se                           #upper 95% limit - roughly speaking. Wald interval is appropriate in this case.
lower95 <- predopt - 2*se                           #lower 95% limit
upper99 <- predopt + 3*se                           #upper 99% limit
lower99 <- predopt - 3*se                           #lower 99% limit
xlim <- range(count + 10)                           #setup plot
ylim <- range(c(upper99, 0))                            #lower limit truncated at zero
main <- plot(count, rates, pch = 21, col = "navajowhite4", bg = "mistyrose4")           #plot rates by organisation
lines(count, predopt, type="l", xlim=xlim, ylim=ylim, xlab="Trials", ylab="Predicted rate", col = "red")    #plot predicted rate
lines (count, upper95, lty="dashed")                        #plot upper limit
lines (count, lower95, lty="dashed")                            #plot lower limit
lines (count, upper99, lty="dotted")                            #plot upper limit
lines (count, lower99, lty="dotted")                            #plot lower limit
cat("The least-squares values of a and b are", coef[[1]], "and", coef[[2]], "respectively", "\n")
print(confint(fit))
if (intervals[[1]] < 1 & intervals [[2]] > 1)
{
message ("There is probably no relationship between success rate and number of trials")
} else
{
message ("There is probably a relationship between success rate and number of trials")
}

The columns Trials and Successes are just two columns of 48 integers (they can be anything. Trials ranges from 129 to 2359 and Successes range from 8 to 365


Updated problem 19.40pm 25th Jan 2018

Current code is:

import rpy2.robjects as ro
from rpy2.robjects.packages import importr

count = ro.IntVector([1,2,3,4,5])
rates = ro.IntVector([1,2,3,4,5])
a = ro.FloatVector([0.5])
b = ro.FloatVector([1.1])

base = importr('base', robject_translations={'with': '_with'})
stats = importr('stats', robject_translations={'format_perc': '_format_perc'})

my_formula = stats.as_formula('rates ~ 1-(1/(10^(a * count ^ (b-1))))')

d = ro.ListVector({'a': a, 'b': b})

fit = stats.nls(my_formula, weights=count, start=d)

I am getting the error:

---------------------------------------------------------------------------
RRuntimeError                             Traceback (most recent call last)
<ipython-input-2-3f7fcd7d7851> in <module>()
      6 d = ro.ListVector({'a': a, 'b': b})
      7 
----> 8 fit = stats.nls(my_formula, weights=count, start=d)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2\robjects\functions.py in __call__(self, *args, **kwargs)
    176                 v = kwargs.pop(k)
    177                 kwargs[r_k] = v
--> 178         return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
    179 
    180 pattern_link = re.compile(r'\\link\{(.+?)\}')

~\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2\robjects\functions.py in __call__(self, *args, **kwargs)
    104         for k, v in kwargs.items():
    105             new_kwargs[k] = conversion.py2ri(v)
--> 106         res = super(Function, self).__call__(*new_args, **new_kwargs)
    107         res = conversion.ri2ro(res)
    108         return res

RRuntimeError: Error in (function (formula, data = parent.frame(), start, control = nls.control(),  : 
  parameters without starting value in 'data': rates, count

I am guessing my count and rates variables are not lists? or something else? I have tried messing around and converting them but to no avail. Any help much appreciated!

This is the code I made for the dataframe:

dataf = ro.DataFrame({})
d = {'count': ro.IntVector((1,2,3,4,5)),'rates': ro.IntVector((1,2,3,4,5))}
dataf = ro.DataFrame(d)
count = dataf.rx(True, 'count')
rates = dataf.rx(True, 'rates')
Nicholas
  • 3,517
  • 13
  • 47
  • 86
  • Can you set up a reproducible example? What is *count*, *rate*, *a*? Also, can you show the R version that works fine? – Parfait Jan 25 '18 at 15:23
  • Hey, thank you. I have added some more detail :) – Nicholas Jan 25 '18 at 15:27
  • Essentially a colleague did this in R, I am trying to replicate it in Python (I struggled to do it in Python so thought I would try and use R. I want to eventually put it in a tool for staff where I work. – Nicholas Jan 25 '18 at 15:28

1 Answers1

1

Consider importing R's stats and base libraries and then replicate needed calls. And use as_formula to convert string representation of formula to actual formula object. Since these are default R libraries find out which method belongs to which package like stats::nls() and base::list().

Notice too in order to keep aligned to Python's syntax rules, any periods in R names are converted to underscores. A few other methods are renamed to avoid clash with Python's own methods.

...
import rpy2.robjects as ro
from rpy2.robjects.packages import importr

base = importr('base', robject_translations={'with': '_with'})
stats = importr('stats', robject_translations={'format_perc': '_format_perc'})

my_formula = stats.as_formula('rates ~ 1-(1/(10^(a * count ^ (b-1))))')

d = ro.ListVector({'a': a, 'b': b})

fit = stats.nls(my_formula, weights=count, start=d)
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Thank you so much. I really appreciate your help :) – Nicholas Jan 25 '18 at 17:50
  • Hey, still working on this after I got home from work. I have made the variables for a, b, count and rates but am having errors. I dont suppose you have any idea? (I have added more comments to my original question). Dont worry if you are busy etc, I was thankful for where you got me already! :) – Nicholas Jan 25 '18 at 19:44
  • 1
    Try binding all those vectors into a dataframe where *rates*, *count*, *a*, *b* are columns and then run formula. – Parfait Jan 25 '18 at 20:45
  • I'll give it a go. Thank you :) – Nicholas Jan 25 '18 at 20:45
  • Still getting the error. I put count and rates into a dataframe (added the code to the question)... but couldnt add 'a' and 'b' as they are only one value (not sure if they all needed to be in). Thank you anyway for your help. I will probably bounty this question in a couple of days if I am still struggling :)... have a good night! – Nicholas Jan 25 '18 at 21:41
  • 1
    You should ask a new question as this strays from this questions original issue since you got `nls()` to work. Also, try importing csv into python pandas dataframe and then run `nls` method. Maybe you need to convert pandas df to R df: `rdf = pandas2ri.py2ri(pydf)` – Parfait Jan 25 '18 at 22:36
  • Thank you parfait. I'll try all this tomorrow morning and if need to I'll make a new question as I agree my question has strayed a little and may be a bit too much for someone who stumbles across this in the future! – Nicholas Jan 25 '18 at 22:39