8

I am a fan of data.table, as of writing re-usable functions for all current and future needs.

Here's a challenge I run into while working on the answer to this problem: Best way to plot automatically all data.table columns using ggplot2

We pass data.table to a function for plotting and then the original data.table gets modified, even though we made a copy of it to prevent that.

Here's a simple code to illustrate:

plotYofX <- function(.dt,x,y) {
  dt <- .dt
  dt[, (c(x,y)) := lapply(.SD, function(x) {as.numeric(x)}), .SDcols = c(x,y)]
  ggplot(dt) + geom_step(aes(x=get(names(dt)[x]), y=get(names(dt)[y]))) + labs(x=names(dt)[x], y=names(dt)[y])
}


> dtDiamonds <- data.table(ggplot2::diamonds[2:5,1:3]); 
> dtDiamonds
   carat     cut color
   <num>   <ord> <ord>
1:  0.21 Premium     E
2:  0.23    Good     E
3:  0.29 Premium     I
4:  0.31    Good     J

> plotYofX(dtDiamonds,1,2); 
> dtDiamonds
    carat   cut color
    <num> <num> <ord>
1:  0.21     4     E
2:  0.23     2     E
3:  0.29     4     I
4:  0.31     2     J

I've seen many postings on various issues related to using := inside the function, but could not find any to help me to resolve this seemingly very easy issue. (Of course, I don't what to convert it back to data.frame to achieve the desired outcome)

IVIM
  • 2,167
  • 1
  • 15
  • 41
  • 4
    Since `:=` assigns values by reference, you need to make an explicit copy. so `dt <- .dt` should be `dt <- copy(.dt)`. See `?copy` for a discussion. – lmo Jun 20 '17 at 20:12
  • Seems simple. Don't use `:=`, but maybe I'm missing something. – IRTFM Jun 20 '17 at 21:59

3 Answers3

3

Try:

dt <- copy(.dt)

It should work well.

1

Thanks to comments/answers above: this would be the easiest solution to this particular function (i.e. no need to introduce any additional .dt variable at all);

plotYofX <- function(dt,x,y) {
  dt[,  lapply(.SD, function(x) {as.numeric(x)}), .SDcols = c(x,y)]
  ggplot(dt) + geom_step(aes(x=get(names(dt)[x]), y=get(names(dt)[y]))) + labs(x=names(dt)[x], y=names(dt)[y]) 

}

However, it was also important to learn that when working with data.table, one should be particularly careful in not making any "copies" of it with regular <- sign, but use copy(dt) instead - so that not corrupt the original data.table!
This is further discussed in detail here: Understanding exactly when a data.table is a reference to (vs a copy of) another data.table

IVIM
  • 2,167
  • 1
  • 15
  • 41
0

Just leaving out the := function seemed to succeed. Of course I wrapped the ggplot value in print(.) as would be standard practice when working inside a function and wanting output.:

plotYofX <- function(.dt,x,y) {
  dt <- .dt
  dt[,  lapply(.SD, function(x) {as.numeric(x)}), .SDcols = c(x,y)]
  print( ggplot(dt) + geom_step(aes(x=get(names(dt)[x]), y=get(names(dt)[y]))) + labs(x=names(dt)[x], y=names(dt)[y]) )
}

> png(); plotYofX(dtDiamonds,1,2); dev.off()
quartz 
     2 
>  dtDiamonds
   carat     cut color
1:  0.21 Premium     E
2:  0.23    Good     E
3:  0.29 Premium     I
4:  0.31    Good     J

enter image description here

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Great! Actually then there's no need even to do dt <- .dt . Also I'm not sure if print() is required for ggplot - it plots without it anyway. – IVIM Jun 21 '17 at 02:03