2

I'm fairly new to data.table coming from using dplyr(). In the function below, the DT object is a reference of the tryCatch statement, and when returned does not behave as I would like it to.

From reading Understanding exactly when a data.table is a reference to (vs a copy of) another data.table, omitting copy at the return statement will return a reference to the tryCatch statement, which in turn (if successful) returns the manipulated/mutated data.table object.

Now, using copy at the end of the function is an unnecessary overhead - how do I return the tryCatch object without calling copy? When not using copy the function returns an object with a reference to the tryCatch (as I understand it) which is not what I want.

Code

Load_Data <- function(path) {

  col.names = c('Ticker', 'Date', 'Time',
                'Open', 'High', 'Low', 'Close', 'Volume')

  DT <- tryCatch({

    DT.try = data.table::fread(path)

    # reformat Date column
    setnames(DT.try, col.names)
    DT.try[, `:=` (Date = as.Date(as.character(Date), format = '%Y%m%d'))]

    }, warning = function(w) {

      print(w); cat('Warning on reading file: ', path)
      # return despite warning
      return(DT.try)

    }, error = function(e) {

      print(e); cat('Error on reading file: ', path)
      return(NA)
    }
  )
  return(copy(DT)) # how do I avoid using copy()? 
  }

# when not returning with copy(DT), then this happens (console output)
> a <- Load_Data('data.example.csv')
> a # a is copied + loaded into memory? 
> a # a is NOW printed

      Ticker       Date  Time     Open     High       Low     Close Volume
   1:    AAK 2005-09-29 00:00 100.0189 100.7159  98.62490  98.62490  17791
   2:    AAK 2005-09-30 00:00  98.9734  99.6704  98.27640  99.67040  35438
   3:    AAK 2005-10-03 00:00  99.3219 100.3674  97.57941  97.57941   6600
   4:    AAK 2005-10-04 00:00  98.2764  98.2764  97.92791  98.27640  31564
   5:    AAK 2005-10-05 00:00  98.2764  99.3219  98.27640  99.32190   3730

data.example.csv | original data being read into Load_data()

  <TICKER> <DTYYYYMMDD> <TIME>   <OPEN>   <HIGH>    <LOW>  <CLOSE> <VOL>
1:      AAK     20050929  00:00 100.0189 100.7159 98.62490 98.62490 17791
2:      AAK     20050930  00:00  98.9734  99.6704 98.27640 99.67040 35438
3:      AAK     20051003  00:00  99.3219 100.3674 97.57941 97.57941  6600
4:      AAK     20051004  00:00  98.2764  98.2764 97.92791 98.27640 31564
5:      AAK     20051005  00:00  98.2764  99.3219 98.27640 99.32190  3730
6:      AAK     20051006  00:00  99.3219  99.3219 98.27640 98.27640 10187
uncool
  • 2,613
  • 7
  • 26
  • 55
  • The `DT` object is local to the function `Load_Data`. Hence, its life-span is restricted within `Load_Data` function. – MKR Feb 03 '18 at 10:24
  • 1
    I think you're ok if you just remove `DT <-` and return the output of the `tryCatch` statement – MichaelChirico Feb 03 '18 at 10:29
  • @MichaelChirico That should work. But then I must put the tryCatch statement as the last expression. That method would break if I wanted to have anything after the tryCatch. Please correct me if I'm wrong in this. – uncool Feb 03 '18 at 10:33
  • Actually what I wanted to mention in previous comment that had `DT` was a variable at `global` space it would have been good to `copy` and return. But in OP case that is not needed as such. – MKR Feb 03 '18 at 10:37
  • 2
    Just put the rest of the function inside your `tryCatch`, I believe that should suffice. If you have an example where you think that won't cover it, please update. In fact, thinking a bit more, why don't you just axe the `tryCatch` from `Load_Data`? It's not like you're taking any conditional action that depends on the contents of the file in `path`. It looks like it'd be fine to just `tryCatch(Load_Data(path), ...)` – MichaelChirico Feb 03 '18 at 12:14

0 Answers0