16

Is there a way to delete all comments in a R script using RStudio?

I need to shrink a file to the smallest size possible. However, this file is heavily commented.

If I am right the search and replace function in Rstudio supporting REGEX might be helpful with this endeavor.

I appreciate any help.

majom
  • 7,863
  • 7
  • 55
  • 88
  • 1
    What character is used to comment? `#`? – rmbaughman May 13 '14 at 12:00
  • 1
    Try `#.*` as the regex. – Roland May 13 '14 at 12:02
  • Whether or not it's possible with regexes: see e.g. http://stackoverflow.com/questions/2319019/. Anyway, a long story; in short: only parser will save you. – gagolews May 13 '14 at 12:18
  • Would you do even better if you saved it as an R binary object? It would have to be a function, rather than a script as such, but you can just wrap it in a `function()` call and job done. Then instead of sourcing, you load the binary and call the function. – Spacedman May 13 '14 at 13:15
  • 2
    Well, not exactly an "RStudio" answer, but you can run your source thorough [formatR](http://cran.r-project.org/web/packages/formatR/formatR.pdf) and there's an option (in one of the functions) to strip comments (IIRC) – hrbrmstr May 13 '14 at 15:31
  • @hrbrmstr: formatR is already mentioned below – gagolews May 13 '14 at 17:01
  • apologies. was on mobile and failed to look at the answers first. – hrbrmstr May 13 '14 at 17:33
  • Thanks for all the comments. They helped me a lot. – majom May 13 '14 at 19:09

1 Answers1

20

I wouldn't approach this task with regexes. It may work, but only in simple cases. Consider the following /tmp/test.R script:

x <- 1 # a comment
y <- "#######"
z <- "# not a comment \" # not \"" # a # comment # here

f <- # a function
   function(n) {
for (i in seq_len(n))
print(i)} #...

As you see, it is a little bit complicated to state where the comment really starts.

If you don't mind reformatting your code (well, you stated that you want the smallest code possible), try the following:

writeLines(as.character(parse("/tmp/test.R")), "/tmp/out.R")

which will give /tmp/out.R with:

x <- 1
y <- "#######"
z <- "# not a comment \" # not \""
f <- function(n) {
    for (i in seq_len(n)) print(i)
}

Alternatively, use a function from the formatR package:

library(formatR)
tidy_source(source="/tmp/test.R", keep.comment=FALSE)
## x <- 1
## y <- "#######"
## z <- "# not a comment \" # not \""
## f <- function(n) {
##     for (i in seq_len(n)) print(i)
## } 

BTW, tidy_source has a blank argument, which might be of your interest. But I can't get it to work with formatR 0.10 + R 3.0.2...

Ferroao
  • 3,042
  • 28
  • 53
gagolews
  • 12,836
  • 2
  • 50
  • 75
  • 2
    Exactly what I was thinking ! And this should also reduce the number of blanks so it minifies the code :) – digEmAll May 13 '14 at 12:15