0

I have the following R data.table, which is composed of only one column:

library(data.table)

DT <- data.table(first_column = c(0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0))

> DT
    first_column
 1:            0
 2:            0
 3:            0
 4:            1
 5:            1
 6:            1
 7:            0
 8:            0
 9:            1
10:            1
11:            0
12:            0
13:            0
14:            0
15:            1
16:            1
17:            1
18:            1
19:            1
20:            0
21:            0
...          ...

The binary column first_column is composed of "clusters" of consecutive ones.

I would like to turn each preceding 0 for each cluster and turn this into a 1. Somehow, one checks for a 1, and then change the preceding 0 into 1.

EDIT: To be more clear, the pattern 0001110011000011111... would become 0011110111000111111...

ShanZhengYang
  • 16,511
  • 49
  • 132
  • 234

3 Answers3

2

Try this using diff:

DT$first_column[diff(DT$first_column)==1] <- 1

    # first_column
 # 1:            0
 # 2:            0
 # 3:            1
 # 4:            1
 # 5:            1
 # 6:            1
 # 7:            0
 # 8:            1
 # 9:            1
# 10:            1
# 11:            0
# 12:            0
# 13:            0
# 14:            1
# 15:            1
# 16:            1
# 17:            1
# 18:            1
# 19:            1
# 20:            0
# 21:            0
    # first_column

Basically diff will output 1 wherever a 1 is preceded by a 0.

989
  • 12,579
  • 5
  • 31
  • 53
2

This will replace the final value of each 0/1 "group" with a 1, which will be redundant for the 1 groups, but what you want to accomplish for the 0s (if I read your question correctly).

DT[, c(head(first_column, -1), 1), by=rleid(first_column)]

rleid is used to group adjacent 0s and 1s and head with -1 keeps all but the final element. Or even better, you can use replace as @Frank suggests, like this

DT[, replace(first_column, .N, 1), by=rleid(first_column)]

where .N is used to specify the final row in the group. Both of these return

    rleid V1
 1:     1  0
 2:     1  0
 3:     1  1
 4:     2  1
 5:     2  1
 6:     2  1
 7:     3  0
 8:     3  1
 9:     4  1
10:     4  1
11:     5  0
12:     5  0
13:     5  0
14:     5  1
15:     6  1
16:     6  1
17:     6  1
18:     6  1
19:     6  1
20:     7  0
21:     7  1
    rleid V1

These solutions (incorrectly) fill in the final observation with a 1. One way to avoid this is to add a check before filling in the values.

DT[, if(.I[.N] < nrow(DT)) replace(first_column, .N, 1) else first_column,
   by=rleid(first_column)]

Here, .I[.N] < nrow(DT) returns TRUE for every group except the final group. The final observation of this group is left "as is."

lmo
  • 37,904
  • 9
  • 56
  • 69
  • 1
    `c(head(x, -1), 1)` is `replace(x, .N, 1)`, I guess. Btw, wrong output for the final row, eh. – Frank Apr 25 '17 at 16:12
  • 1
    Oh neat. I didn't think of `replace` in this context. – lmo Apr 25 '17 at 16:14
  • 1
    I guess you didn't see the edit to my comment, but your output is not right. Row 21 should have zero still. – Frank Apr 25 '17 at 16:31
2

If I understood the OP correctly, he wants to turn any occurence of the sub-sequence 0,1 into 1,1:

DT <- data.table(first_column = c(0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0))

DT[first_column == 0 & shift(first_column, type = "lead") == 1, first_column := 1]

DT[, first_column]
# [1] 0 0 1 1 1 1 0 1 1 1 0 0 0 1 1 1 1 1 1 0 0

At the expense of implicit type conversions from double to logical, this can be written more concisely as:

DT <- data.table(first_column = c(0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0))

DT[!first_column & shift(first_column, type = "lead"), first_column := 1]
DT[, first_column]
# [1] 0 0 1 1 1 1 0 1 1 1 0 0 0 1 1 1 1 1 1 0 0

Here, the fact is used that 0 is treated as FALSE and any number unequal to 0 as TRUE.

Uwe
  • 41,420
  • 11
  • 90
  • 134