Expand number range to the individual numbers with NAs

Question

This question is based on this post.

Say that I have an NA:

df <- data.frame(start = c(10, 20), end = c(15,NA), label = c('ex1','ex2'))

When I use the following code:

df[, seq(.SD[['start']], .SD[['end']]), by = label]

I get the following error:

Error in `[.data.frame`(df, , seq(.SD[["start"]], .SD[["end"]]), by = label) : 
  unused argument (by = label)

How do I get something like this?:

label V1
 1:   ex1 10
 2:   ex1 11
 3:   ex1 12
 4:   ex1 13
 5:   ex1 14
 6:   ex1 15
 7:   ex2 20

Ronak Shah · Accepted Answer · 2021-08-03T13:43:19.740

2

You can use fcoalesce to replace the NA values in end with the start value and create a sequence from start to end for each label.

library(data.table)

setDT(df)
df <- df[!(is.na(start) & is.na(end))]
df[, end := fcoalesce(end, start)]
df[, seq(start, end), by = label]

#   label V1
#1:   ex1 10
#2:   ex1 11
#3:   ex1 12
#4:   ex1 13
#5:   ex1 14
#6:   ex1 15
#7:   ex2 20

Or using dplyr -

library(dplyr)

df %>%
  filter(!(is.na(start) & is.na(end))) %>%
  mutate(end = coalesce(end, start)) %>%
  group_by(label) %>%
  summarise(num = seq(start, end), .groups = 'drop')

edited Aug 03 '21 at 13:43

answered Aug 03 '21 at 02:08

Ronak Shah

377,200
20
156
213

This is wonderful. How do I do it if I have NAs for both start and end? – hy9fesh Aug 03 '21 at 13:29
What do you want to do when you have `NA`'s in `start` as well as `end` ? – Ronak Shah Aug 03 '21 at 13:32
Drop the entry. – hy9fesh Aug 03 '21 at 13:34
You can drop such rows first with `df <- df[!(is.na(start) & is.na(end))]` before using `fcoalesce` code. – Ronak Shah Aug 03 '21 at 13:38
Thanks. Is there a way to do it dplyr? – hy9fesh Aug 03 '21 at 13:40

Expand number range to the individual numbers with NAs

1 Answers1