0

This question is based on this post.

Say that I have an NA:

df <- data.frame(start = c(10, 20), end = c(15,NA), label = c('ex1','ex2'))

When I use the following code:

df[, seq(.SD[['start']], .SD[['end']]), by = label]

I get the following error:

Error in `[.data.frame`(df, , seq(.SD[["start"]], .SD[["end"]]), by = label) : 
  unused argument (by = label)

How do I get something like this?:

label V1
 1:   ex1 10
 2:   ex1 11
 3:   ex1 12
 4:   ex1 13
 5:   ex1 14
 6:   ex1 15
 7:   ex2 20
hy9fesh
  • 589
  • 2
  • 15

1 Answers1

2

You can use fcoalesce to replace the NA values in end with the start value and create a sequence from start to end for each label.

library(data.table)

setDT(df)
df <- df[!(is.na(start) & is.na(end))]
df[, end := fcoalesce(end, start)]
df[, seq(start, end), by = label]

#   label V1
#1:   ex1 10
#2:   ex1 11
#3:   ex1 12
#4:   ex1 13
#5:   ex1 14
#6:   ex1 15
#7:   ex2 20

Or using dplyr -

library(dplyr)

df %>%
  filter(!(is.na(start) & is.na(end))) %>%
  mutate(end = coalesce(end, start)) %>%
  group_by(label) %>%
  summarise(num = seq(start, end), .groups = 'drop')
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213