2

I have a dataframe df that is quite large. At some point, I want to save it to the database, using sqlSave(). Since the database table has a mostly similar structure but not quite the same column names, I have some massaging to do first. Therefore, I do the following:

# I believe i'm copying by val
mycopy <- df

# but i also know setnames uses references... so is this the culprit?
setnames(mycopy, names(mycopy)[1], "NewColumnName")

I was horrified to discover due to warnings from other parts of my app) that the original 'df' dataframe, in my Global Environment window in RStudio, had the column names etc renamed!! How do I stop this from happening? Why isn't setnames() pointing at 'mycopy' instead of the original 'df'?

UPDATE: reproducible example:

library(data.table) # for setnames
df <- data.frame(foo=c(1,2,3), bar=c(4,5,6))
mycopy <- df
setnames(mycopy, names(mycopy)[1], "NewColumnName")
str(df)

yields this:

> str(df)
'data.frame':   3 obs. of  2 variables:
 $ NewColumnName: num  1 2 3
 $ bar          : num  4 5 6

this should not happen?

Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
rstruck
  • 1,174
  • 4
  • 17
  • 27
  • `setnames` is not a function from a base R package, so you should add a more complete code example (not too much more complete!) than includes the package(s) that you are using. – Martin Morgan Apr 10 '15 at 16:04
  • I'm sorry, but I can't tell where setnames is from. when i do ?setnames it shows me the description in RStudio but not what pkg it comes from? – rstruck Apr 10 '15 at 16:11
  • UPDATE: apparently it's from data.table. i will try to build a reproducing example and update the q in a bit. – rstruck Apr 10 '15 at 16:33

1 Answers1

0

Actually, this non-R question: How to clone or copy a list? gave me a hint of how to force copy by value.

so in the end i did:

mycopy <- data.frame(df)

and it seems to work. I don't know how I never noticed R was doing copy by ref before now...

Community
  • 1
  • 1
rstruck
  • 1,174
  • 4
  • 17
  • 27
  • 1
    In base R this step would not be necessary; you've stumbled into a data.table-ism. You're actually coercing from a data.table (despite the fact that you've named your variable `df`!) to a data.frame, rather than making a copy of one data.frame to another data.frame, or one data.table to another data.table. [This](http://stackoverflow.com/questions/10225098/understanding-exactly-when-a-data-table-is-a-reference-to-vs-a-copy-of-another) is a relevant post for pass-by-reference in data.table. – Martin Morgan Apr 10 '15 at 17:42
  • @MartinMorgan thank you. All of this is quite the revelation! – rstruck Apr 10 '15 at 21:19