43

Short version

Can I replace

source(filename, local = TRUE, encoding = 'UTF-8')

with

eval(parse(filename, encoding = 'UTF-8'))

without any risk of breakage, to make UTF-8 source files work on Windows?

Long version

I am currently loading specific source files via

source(filename, local = TRUE, encoding = 'UTF-8')

However, it is well known that this does not work on Windows, full stop.

As a workaround, Joe Cheng suggested using instead

eval(parse(filename, encoding = 'UTF-8'))

This seems to work quite well1 but even after consulting the source code of source, I don’t understand how they differ in one crucial detail:

Both source and sys.source do not simply parse and then eval the file content. Instead, they parse the file content and then iterate manually over the parsed expressions, and eval them one by one. I do not understand why this would be necessary in sys.source (source at least uses it to show verbose diagnostics, if so instructed; but sys.source does nothing of the kind):

for (i in seq_along(exprs)) eval(exprs[i], envir)

What is the purpose of evaling statements separately? And why is it iterating over indices instead directly over the sub-expressions? What other caveats are there?

To clarify: I am not concerned about the additional parameters of source and parse, some of which may be set via options.


1 The reason that source is tripped up by the encoding but parse isn’t boils down to the fact that source attempts to convert the input text. parse does no such thing, it reads the file’s byte content as-is and simply marks its Encoding as UTF-8 in memory.

Community
  • 1
  • 1
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • Why are you doing any of those? Put your code in packages. – Spacedman Jun 27 '14 at 14:44
  • 1
    @Spacedman Why do these commands even exist then? (For a more specific reason: because I’m working on an [alternative package system](https://github.com/klmr/modules).) – Konrad Rudolph Jun 27 '14 at 14:52
  • @Spacedman [You know](http://stackoverflow.com/questions/15789036/namespaces-without-packages#comment22478647_15789538) he doesn't want to, so why [keep poking](https://stat.ethz.ch/pipermail/r-devel/2014-April/068926.html) that bear (especially on an only tangentially related question such as this)?! – Josh O'Brien Jun 27 '14 at 14:56
  • @JoshO'Brien Ah yes. I’ve done that. – Konrad Rudolph Jun 27 '14 at 15:15
  • 1
    Ah. Still working on that module system. Fairy Nuff. – Spacedman Jun 27 '14 at 16:20
  • 1
    So is this question purely "What's the difference between `eval(exprs)` and `for(ex in exprs)eval(ex)` and `for(i in seq_along(exprs))eval(exprs[i])`?"? (give or take an `envir` here and there) Its a bit messy as it stands. I mean, all the UTF stuff is irrelevant, and your real question comes in half way through. – Spacedman Jun 27 '14 at 16:38
  • @Spacedman Not purely, since the sources of `source` and `parse` are long and obscure and I might have missed another important difference – notably with the involvement of `srcfilecopy`, which I do not entirely understand. And providing context is generally seen as crucial on Stack Overflow, in order to determine whether the OP has fallen prey to an [XY problem](http://meta.stackexchange.com/q/66377). My *actual* question is therefore the one in the title, and explained by the first few paragraphs. – Konrad Rudolph Jun 27 '14 at 16:51
  • 2
    So just to clarify a bit more: Joe Cheng's `eval(parse(...))` workaround is so far working just fine, but you're wanting to know if there are any consequential differences that might, at some point, bite, right? And you're naturally a bit uneasy, since you're not fully following what the source code does (and where you are following it, you're not always understanding its rationale)... Is your ultimate (but maybe too-vague-for-SO) question really, "can somebody confirm that `eval(parse(...))` is an everywhere reliable replacement for `source(...)`"? – Josh O'Brien Jun 27 '14 at 17:05
  • @Josh Yes. All my tests pass but they are of course not exhaustive and I cannot test easily on Windows. Furthermore, I intended to reference this discussion in a source code comment as an explanation. – Konrad Rudolph Jun 28 '14 at 11:16
  • 5
    The [**evaluate**](http://cran.cnr.berkeley.edu/web/packages/evaluate/index.html) package authors (Yihui, Hadley, and Barret Schloerke) might have some interesting insight into your question. **evaluate** underlies **knitr**'s statement-by-statement evaluation of R code, and it works by parsing and then "manually" iterating over parsed expressions, evaluating each in turn. If anybody would know about the potential "gotcha"s of doing or not doing that, I'd think they might be the ones. – Josh O'Brien Jun 28 '14 at 18:48
  • @Autar The “source” tag in this instance referred to a specific technique (actually, the `source` function) — not the generic “source code”. I’m pretty sure that’s not meant by the ban of the “source” tag. That said, it’s maybe a little too specific. – Konrad Rudolph Aug 14 '15 at 18:59

1 Answers1

5

This is not a full answer as it primarily addresses the seq_along part of the question, but too lengthy to include as comments.

One key difference between the seq_along followed by [ vs just using for i in x approach (which I believe is be similar to seq_along followed by [[ instead of [) is that the former preserves the expression. Here is an example to illustrate the difference:

> txt <- "x <- 1 + 1
+ # abnormal expression
+   2 *
+     3
+ "
> x <- parse(text=txt, keep.source=TRUE)
> 
> for(i in x) print(i)
x <- 1 + 1
2 * 3
> for(i in seq_along(x)) print(x[i])
expression(x <- 1 + 1)
expression(2 *
    3)

Alternatively:

> attributes(x[[2]])
NULL
> attributes(x[2])
$srcref
$srcref[[1]]
2 *
    3

Whether this has any practical impact when comparing to eval(parse(..., keep.source=T)), I can only say that it could, but can't imagine a situation where it does.

Note that subsetting expression separately also leads to the srcref business getting subset, which could conceivably be useful (...maybe?).

BrodieG
  • 51,669
  • 9
  • 93
  • 146