3

With the advent of reticulate, combining R and Python in a single .Rmd document has become increasingly popular among the R community (myself included). Now, my personal workflow usually starts with an R script and, at some point, I create a shareable report using knitr::spin() with the plain .R document as input in order to avoid code duplication (see also Knitr's best hidden gem: spin for more on the topic).

However, as soon as Python code is involved in my analysis, I am currently forced to break this workflow and manually convert (ie. copy and paste) my initial .R script into .Rmd before compiling the report. I wonder, does anybody know whether it is – or for that matter, will ever be – possible to make knitr::spin() work with both R and Python code chunks in a single .R file without taking this detour? I mean, just like it works when mixing the two languages, and exchanging objects between them, in a .Rmd file. There is, at least to the best of my knowledge, no possibility to add something like engine = 'python' to spin documents at the moment.

fdetsch
  • 5,239
  • 3
  • 30
  • 58

1 Answers1

3

Use of reticulate::source_python could be one solution.

For example, here is a simple .R script which will be spun to .Rmd and then rendered to .html

spin-me.R

#'---
#'title: R and Python in a spin file.
#'---
#'
#' This is an example of one way to write one R script, containing both R and
#' python, and can be spun to .Rmd via knitr::spin.
#'
#+ label = "setup"
library(nycflights13)
library(ggplot2)
library(reticulate)
use_condaenv()

#'
#' Create the file flights.csv to
#'
#+ label = "create_flights_csv"
write.csv(flights, file = "flights.csv")

#'
#' The file flights.py will read in the data from the flights.csv file.  It can
#' be evaluated in this script via source_python().  This sould add a data.frame
#' called `py_flights` to the workspace.
source_python(file = "flights.py")

#'
#' And now, plot the results.
#'
#+ label = "plot"
ggplot(py_flights) + aes(carrier, arr_delay) + geom_point() + geom_jitter()


# /* spin and knit this file to html
knitr::spin(hair = "spin-me.R", knit = FALSE)
rmarkdown::render("spin-me.Rmd")
# */

The python file is

flights.py

import pandas
py_flights = pandas.read_csv("flights.csv")
py_flights = py_flights[py_flights['dest'] == "ORD"]
py_flights = py_flights[['carrier', 'dep_delay', 'arr_delay']]
py_flights = py_flights.dropna()

And a screen capture of the resulting .html is:

enter image description here

EDIT If keeping everything in one file is a must, then before the source_python call you could create a python file, e.g.,

pycode <-
'import pandas
py_flights = pandas.read_csv("flights.csv")
py_flights = py_flights[py_flights["dest"] == "ORD"]
py_flights = py_flights[["carrier", "dep_delay", "arr_delay"]]
py_flights = py_flights.dropna()
'
cat(pycode, file = "temp.py")
source_python(file = "temp.py")

My opinion: having the python code in its own file would be preferable to having it created in the R script for two reasons:

  1. Easier reuse of the python code
  2. Syntax highlighting in my IDE is lost for the python code when written as a string an not in its own file.
Peter
  • 7,460
  • 2
  • 47
  • 68
  • Great solution, thanks. The only downside here is that you still have to maintain two separate files. But judging from the answer by @cderv provided [here](https://github.com/yihui/knitr/issues/1773), either using `source_python()` or switching to `.Rmd` are the only valid options at the moment. – fdetsch Nov 19 '19 at 16:19
  • 1
    @fdetsch, I've added an edit to the answer. Writing python in a string, sending that string to a file, and then using `source_python` could support a one file development model. – Peter Nov 19 '19 at 17:28
  • 1
    .. or simply `py_run_string(pycode)`, ie. without writing the code to disk. This seems doable for short Python code chunks. However, I fully agree with you that for longer Python code, reusability and syntax highlighting justify the use of separate `.py` scripts. See also section Executing Code in [Calling Python from R](https://rstudio.github.io/reticulate/articles/calling_python.html). – fdetsch Nov 19 '19 at 18:23