0

The basic question is this: Let's say I was writing R functions which called python via rPython, and I want to integrate this into a package. That's simple---it's irrelevant that the R function wraps around Python, and you proceed as usual. e.g.

# trivial example
# library(rPython)
add <- function(x, y) {
  python.assign("x", x)
  python.assign("y", y)
  python.exec("result = x+y")
  result <- python.get("result")
  return(result)
}

But what if the python code with R functions require users to import Python libraries first? e.g.

# python code, not R
import numpy as np
print(np.sin(np.deg2rad(90)))

# R function that call Python via rPython
# *this function will not run without first executing `import numpy as np`
print_sin <- function(degree){
   python.assign("degree", degree)
   python.exec('result = np.sin(np.deg2rad(degree))')
   result <- python.get('result')
   return(result)
}

If you run this without importing the library numpy, you will get an error.

How do you import a Python library in an R package? How do you comment it with roxygen2?

It appears the R standard is this:

# R function that call Python via rPython
# *this function will not run without first executing `import numpy as np`
print_sin <- function(degree){
   python.assign("degree", degree)
   python.exec('import numpy as np')
   python.exec('result = np.sin(np.deg2rad(degree))')
   result <- python.get('result')
   return(result)
}

Each time you run an R function, you will import an entire Python library.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
ShanZhengYang
  • 16,511
  • 49
  • 132
  • 234
  • I'm confused by this whole series of questions. I've literally never used rPython, but just scanning the documentation I don't quite understand why you wouldn't simply call `python.exec("import numpy as np")` inside the function itself first? – joran Oct 20 '16 at 19:10
  • 3
    Its not ridiculous. "Importing an entire python library" twice is not a problem. Most python codes could have `import sys` in a hundred places. Having an `import` for every function is a good example of "explicit is better than implicit". Or have an R `.First` in your package that imports all your python libs, but then you have to keep that up to date but thats probably a false optimisation. – Spacedman Oct 20 '16 at 19:10
  • @Spacedman Well, if you insist. Python maintains an internal list of all modules that have been imported. "Most python codes could have import sys in a hundred places." But they don't. Just as R libraries are not loaded with R functions, it's weird to load python libraries in functions. It also hits your performance, I would think. The second idea is probably better, to use `.First` – ShanZhengYang Oct 20 '16 at 19:24
  • `.First` is for sessions; you want `.onLoad` / `.onAttach` for a package. – Dirk Eddelbuettel Oct 20 '16 at 19:53
  • @ShanZhengYang , actually a fairly standard idiom in R functions is `if (require("packagename")) { ... } ` – Ben Bolker Oct 20 '16 at 20:00
  • Most python *modules* may only have one `import sys` but every module that needs `sys` will have `import sys`, so in a complex system of connected modules there could be lots of `import sys` going on. Please show some benchmarks to support your performance claim. A loop around `import numpy as np` should do it. – Spacedman Oct 20 '16 at 20:18
  • @Spacedman https://wiki.python.org/moin/PythonSpeed/PerformanceTips#Import_Statement_Overhead – ShanZhengYang Oct 20 '16 at 20:42
  • 1
    @ShanZhengYang enjoy your 7/100000 of a second you save every time. – Spacedman Oct 20 '16 at 20:51

1 Answers1

1

As @Spacedman and @DirkEddelbuettel suggest you could add a .onLoad/.onAttach function to your package that calls python.exec to import the modules that will typically always be required by users of your package.

You could also test whether the module has already been imported before importing it, but (a) that gets you into a bit of a regression problem because you need to import sys in order to perform the test, (b) the answers to that question suggest that at least in terms of performance, it shouldn't matter, e.g.

If you want to optimize by not importing things twice, save yourself the hassle because Python already takes care of this.

(although admittedly there is some quibblingdiscussion elsewhere on that page about possible scenarios where there could be a performance cost). But maybe your concern is stylistic rather than performance-oriented ...

Community
  • 1
  • 1
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Thanks for this! There is some mention of a performance overhead on that page, but I might just bite the bullet and write it into the R function. The motivation for asking this question was originally to find out what the "standard" was, and I think we've covered this. I appreciate the help! Thank you! – ShanZhengYang Oct 20 '16 at 20:43