0

I am doing operations in two columns of a data.table in R. I have first filtered the datatable by DATE, by1min and by EX, but then I realized that I actually don't want the data within the 1 minute bin. I want it for everything before that bin.

What I had before was:

bid_ask <- new_file[,list(btail=tail(BID,n=1),atail=tail(ASK,n=1)), by=c("DATE","by1min","EX")]

And now I wanted to do something like:

bid_ask <- new_file[,list(btail=tail(BID,n=1),atail=tail(ASK,n=1)), by=c("DATE",TIME_M[TIME_M<by1min],"EX")]

But this does not seem to be the structure for data.table.

Does anyone know how to filter conditionally within "by"?

My data looks like this:

         DATE   TIME_M EX SYM_SUFFIX   BID   ASK by1min
1: 2016-03-01 14400.02  P         NA 60.00 95.00  14400
2: 2016-03-01 24889.07  T         NA 60.01 65.00  24840
3: 2016-03-01 24889.25  T         NA 60.01 64.99  24840
4: 2016-03-01 25085.24  T         NA 60.01 64.99  25080
5: 2016-03-01 25085.36  T         NA 62.50 64.99  25080
6: 2016-03-01 25103.37  T         NA 62.51 64.99  25080
python_enthusiast
  • 896
  • 2
  • 7
  • 26
  • 2
    Hi @python_newbie - its impossible to help you without knowing what your data.table looks like. Please take a look at https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and update your question. – Scott Ritchie Feb 18 '18 at 20:54
  • Sorry... I have added the data. What I would like to do is get the last posted bid/ask for each exchange until the threshold of a minute. But I am having a hard time using "by". As you can see, I can filter "within" a 1 minute bin, but not "until" the 1 min bin – python_enthusiast Feb 18 '18 at 21:12
  • 1
    How do you want to filter your data? What is expected end result? – MKR Feb 18 '18 at 21:37
  • Is the problem that the code you've written doesn't run? The data structure you've posted is not a `data.table`, it looks like a `tibble` generated by `dplyr` code. You should be able to convert this to a `data.table` by running either `as.data.table(new_file)` or `as.data.table(as.data.frame(new_file))`. – Scott Ritchie Feb 18 '18 at 23:18
  • I updated the data to be a data.table instead of what it was before. I want to filter the data by "DATE", then by everything below the 1 minute threshold (eg, <24840, or <25080), then by "EX". Right now, what I have gives me the first and the last filter, but the second filter gives me everything within a 1 minute bin, not everything before the threshold. – python_enthusiast Feb 20 '18 at 05:24

1 Answers1

1

The syntax is something like this:

bid_ask <- new_file[, list(btail=tail(BID, n=1),atail=tail(ASK, n=1)), by=.(DATE, TIME_M < by1min, EX)]
bid_ask 
#         DATE TIME_M EX btail atail
#1: 2016-03-01  FALSE  P 60.00 95.00
#2: 2016-03-01  FALSE  T 62.51 64.99
#3: 2016-03-01   TRUE  T 62.51 64.99
#4: 2016-03-01   TRUE  P 60.00 95.00

from ?data.table documentation:

by accepts:

  • a list() of expressions of column names: e.g., DT[, .(sa=sum(a)), by=.(x=x>0, y)]

sample data:

dat <- fread("DATE   TIME_M EX SYM_SUFFIX   BID   ASK by1min
2016-03-01 14400.02  P         NA 60.00 95.00  14400
2016-03-01 24889.07  T         NA 60.01 65.00  24840
2016-03-01 24889.25  T         NA 60.01 64.99  24840
2016-03-01 25085.24  T         NA 60.01 64.99  25080
2016-03-01 25085.36  T         NA 62.50 64.99  25080
2016-03-01 25103.37  T         NA 62.51 64.99  25080
2016-03-01 25000  T         NA 62.51 64.99  25080
2016-03-01 14399  P         NA 60.00 95.00  14400")
chinsoon12
  • 25,005
  • 4
  • 25
  • 35