Avoid creating NAs during out of bounds indexing of .SD

Question

I would be interested to know if there is a way to avoid the generation of NAs when subsetting .SD with an index larger than its number of rows, rather than having to remove them in a second step.

I only see a way to remove NA in a second step, ex. using na.omit.

data.table(A = 1)[ , .SD[1:2]]
##    A
## 1:  1
## 2: NA

# this is what I do now, removing NAs in a second step
na.omit(data.table(A = 1)[ , .SD[1:2]])
##    A
## 1: 1

```data.table(A = 1)[ , .SD[1:3]][complete.cases(data.table(A = 1)[ , .SD[1:3]][])]``` — M--, Jun 15 '19 at 07:37
Thanks, but chaining is still a second step, I wrote second step and not second line just for that. If it avoids the (internal) creation and cancellation of NA rows then great, that is what I would like to avoid if possible. — user778806, Jun 15 '19 at 07:44
Well @Henrik points is very much valid. But say an operation does introduce `NA` I don't think there's an argument like `na.omit = TRUE` in `data.table`. — M--, Jun 15 '19 at 07:48
@Henrik trying to work with ngrams, at a certain point I want to keep at the most N predictions for a predecessors, predecessors that have less then N successors get NAs. I am in no way "affectionate" to using .SD, alternative ways that avoid NAs from appearing are welcome. PS I work inside a function that tries to manage ngrams for any n > 2 so predecessor columns are passed as parameters. — user778806, Jun 15 '19 at 07:55
@M-M if your comment is confirmed then it is an answer for me, let's wait 2 or 3 days — user778806, Jun 15 '19 at 07:56
maybe this has something to offer in the context you described: [StackOverflow: Fastest way to replace NAs in a large data.table](https://stackoverflow.com/questions/7235657/fastest-way-to-replace-nas-in-a-large-data-table) — M--, Jun 15 '19 at 07:59
You can do `data.table(A = 1)[ , head(.SD, x)]` if it's just the first up-to-x rows — Frank, Jun 15 '19 at 14:42

Avoid creating NAs during out of bounds indexing of .SD

0 Answers0