I have the following data.table, with a column of NA values and non-NA values of type character
library(data.table)
dt = fread(...)
print(dt$column1)
[1] NA NA NA "1 1" "1 1" "1 1" NA NA NA NA
[11] NA "1 2" NA NA NA NA NA NA NA NA
[21] NA NA NA NA NA NA NA NA NA NA
[31] NA NA NA NA NA "1 3" NA NA NA NA
[41] NA "1 4" "1 4" NA NA NA NA NA NA NA
[51] NA NA NA NA NA NA NA NA NA NA
[61] NA NA "1 5" NA NA NA NA NA NA NA
...
I would like a new column which denotes the labels of consecutive non-NA values, i.e.
print(dt$groups)
[1] 0 0 0 1 1 1 0 0 0 0
[11] 0 2 0 0 0 0 0 0 0 0
[21] 0 0 0 0 0 0 0 0 0 0
[31] 0 0 0 0 0 3 0 0 0 0
[41] 0 4 4 0 0 0 0 0 0 0
[51] 0 0 0 0 0 0 0 0 0 0
[61] 0 0 5 0 0 0 0 0 0 0
...
If I try this:
dt[, groups := !is.na(column1)]
This will give me a boolean vector, with consecutive TRUE statements. I am not sure however how to translate this into labels for consecutive pairs of TRUE.
is there a data.table way to do this?