1

I would be interested to know if there is a way to avoid the generation of NAs when subsetting .SD with an index larger than its number of rows, rather than having to remove them in a second step.

I only see a way to remove NA in a second step, ex. using na.omit.

data.table(A = 1)[ , .SD[1:2]]
##    A
## 1:  1
## 2: NA

# this is what I do now, removing NAs in a second step
na.omit(data.table(A = 1)[ , .SD[1:2]])
##    A
## 1: 1
user778806
  • 67
  • 6
  • 1
    ```data.table(A = 1)[ , .SD[1:3]][complete.cases(data.table(A = 1)[ , .SD[1:3]][])]``` – M-- Jun 15 '19 at 07:37
  • Thanks, but chaining is still a second step, I wrote second step and not second line just for that. If it avoids the (internal) creation and cancellation of NA rows then great, that is what I would like to avoid if possible. – user778806 Jun 15 '19 at 07:44
  • Well @Henrik points is very much valid. But say an operation does introduce `NA` I don't think there's an argument like `na.omit = TRUE` in `data.table`. – M-- Jun 15 '19 at 07:48
  • @Henrik trying to work with ngrams, at a certain point I want to keep at the most N predictions for a predecessors, predecessors that have less then N successors get NAs. I am in no way "affectionate" to using .SD, alternative ways that avoid NAs from appearing are welcome. PS I work inside a function that tries to manage ngrams for any n > 2 so predecessor columns are passed as parameters. – user778806 Jun 15 '19 at 07:55
  • @M-M if your comment is confirmed then it is an answer for me, let's wait 2 or 3 days – user778806 Jun 15 '19 at 07:56
  • maybe this has something to offer in the context you described: [StackOverflow: Fastest way to replace NAs in a large data.table](https://stackoverflow.com/questions/7235657/fastest-way-to-replace-nas-in-a-large-data-table) – M-- Jun 15 '19 at 07:59
  • 3
    You can do `data.table(A = 1)[ , head(.SD, x)]` if it's just the first up-to-x rows – Frank Jun 15 '19 at 14:42

0 Answers0