5

Consider the following code:

lotsOfNumbers <- rep(pi, 5 * 10^7) 
system.time(lotsOfNumbers[1] <- 0)

If I enter one line of code at a time, the second line of code takes about 0.267 seconds to evaluate on my computer. This is not surprising to me: I know that the R code lotsOfNumbers[1] <- 0 actually recopies lotsOfNumbers into a new object, so this function actually scales with the length of lotsOfNumbers rather than being constant.

What is surprising to me is that if you enter both lines of code at the same time, system.time will report that the second line of code takes 0.000 seconds on my computer.

Why would entering both lines at the same time speed up the second line?

Thomas Baruchel
  • 7,236
  • 2
  • 27
  • 46
Cliff AB
  • 1,160
  • 8
  • 15
  • 1
    Interesting... Not an answer but maybe a clue: encapsulate your first line in a system.time() call as well. Then running both lines at the same time or running them one by one makes no difference in the durations. – Dominic Comtois Nov 28 '15 at 04:29
  • @DominicComtois: yes, I had noticed this...but then if you run further lines of code like the second one (i.e. `system.time(lotsOfNumbers[1] <- 1`), they are fast (even though the second one is still slow). – Cliff AB Nov 28 '15 at 04:35
  • Intreaguing... maybe a glitch in the system.time function? I tried running it in batch mode, and have the same odd time of 0 at the end. – Dominic Comtois Nov 28 '15 at 04:44
  • @TheTime It is... with `pryr` you can validate that, see http://adv-r.had.co.nz/memory.html – Dominic Comtois Nov 28 '15 at 06:03
  • 2
    I'm going to guess you're using RStudio. Am I right? If so, to see why I guessed that (and why you're seeing the behavior you are), [see here](http://stackoverflow.com/a/15559956/980833) especially the section leading up to and including the 3rd code block. – Josh O'Brien Nov 28 '15 at 06:03
  • @JoshO'Brien I replicated the phenomenon using Rscript.exe... – Dominic Comtois Nov 28 '15 at 06:05
  • @JoshO'Brien: Ah yes! I think you've got it there. Going to read through a little more on that post... – Cliff AB Nov 28 '15 at 06:06
  • @DominicComtois How do you enter those lines one at a time using Rscript? – Josh O'Brien Nov 28 '15 at 06:06
  • @JoshO'Brien Well obviously I don't. It was to outrule the influence of the interface factor (Rstudio or else) – Dominic Comtois Nov 28 '15 at 06:09
  • 1
    @CliffAB -- Hmm. That's odd. I don't see that using R (via Rgui on a Windows 7 box). The call to `system.time(lotsOfNumbers[2] <- 0)` takes 0 seconds in either case (i.e. whether the lines are entered together or one after the other). – Josh O'Brien Nov 28 '15 at 06:11
  • 1
    @JoshO'Brien: yes, you are correct. I had only replicated the slow part in R (i.e. entering both at the same time) so I thought they were identical. But you are correct, using R the insertion is fast, even if broken up. So I'm now buying your Rstudio argument (sorry, I edited my earlier comment where I disagreed with that). – Cliff AB Nov 28 '15 at 06:14
  • @DominicComtois -- Likewise, on my Windows 7 box, using Rscript gets me timings of 0 seconds across the board. Wonder what's going on. – Josh O'Brien Nov 28 '15 at 06:14
  • Re: in-place or not, in RStudio the vector is indeed copied when modifying 1st item (no need to use system.time for that to happen). But in a "terminal" R.exe session, it is modified in-place. Boy, still puzzled here. – Dominic Comtois Nov 28 '15 at 06:22
  • @TheTime Oh I wouldn't _sue_ them for that tho :p ... But in any case, the behavior of system.time is still weird even in batch mode! – Dominic Comtois Nov 28 '15 at 06:29
  • @TheTime Hmm not sure I'm following. Are we talking about the same issue? Here's what I put in the script to run via Rscript.exe (on 3 separate lines): `system.time(lotsOfNumbers <- rep(pi, 5e7)) system.time(lotsOfNumbers[1] <- 0) system.time(lotsOfNumbers[2] <- 0)` – Dominic Comtois Nov 28 '15 at 06:38
  • @DominicComtois Probably not talking about the exact same issue. Your example is highlighting the fact that the version of `lotsOfNumbers` returned by the call to `system.time()` has `NAM[2]` whereas that created outside of `system.time()` has `NAM[1]`. (To see this, compare: `system.time(X <- rep(pi, 5e7)); .Internal(inspect(X)); Y <- rep(pi, 5e7); .Internal(inspect(Y))`.) This in turn affects whether or not the entire object gets copied over when performing a simple replacement. – Josh O'Brien Dec 02 '15 at 19:13

0 Answers0