Conditionally changing cell value when string appears in a separate column in R: time series

Question

Problem:

The starting states of "b" and "c" are determined by presence in another table.

dd$b[1] <- ifelse( "b" %in% table,1,0)  
dd$c[1] <- ifelse( "c" %in% table,1,0)

I'm hoping to come up with a formula that:

Checks to see if the value in column a contains "lamp" and "b" and makes b a 0 if so,
Else checks to see if the value in column a contains "lion" and "b" and makes b a 1 if so,
Or defaults to the value in the previous row for b.

To include to example outputs, the column c is included and should follow the same rules as the above.

Referring to the cell value above is crucial as a third step, because this is designed "change states" as time progresses.

Pasteable table:

dd = read.table(header = T, text = "    time   a    b   c   d   s   
k   w
1   18:41  'b d cat'       1     0   1   0    0   1
2   18:43  'b d dog'       1     0   1   0    0   1
3   18:47  'b d lamp'      0     0   0   0    0   1
4   18:51  'b s dog'       0     0   0   0    0   1
5   18:52  'b k cat'       0     0   0   0    0   1
6   18:57  'b c lion'      1     1   0   0    0   1
7   18:59  'b a dog'       1     1   0   0    0   1
8   19:23  'b w lamp'      0     1   0   0    0   0 
9   19:25  'b r cat'       0     1   0   0    0   0")

Desired Output:

    time   a           b      c
1   18:41  b d cat       1     0
2   18:43  b d dog       1     0 
3   18:47  b d lamp      0     0
4   18:51  b s dog       0     0
5   18:52  b k cat       0     0
6   18:57  b c lion      1     1
7   18:59  b a dog       1     1
8   19:23  b w lamp      0     1 
9   19:25  b r cat       0     1

Essentially, I am looking for a way to make column a's values operate as an on/off switch for the flag variables in b and c.

This is something that can be done quickly in Excel using relational formulas, but I would love an elegant R-based solution if one exists!

To solve the problem, the tricks I would need to incorporate (at least to my understanding) would be

Come up with a formula and have it only apply to all rows except the first row.
Refer to the row above if states are not changed based on the first two bits of logic.

Thank you in advance.

While there are other ways to do this I would take a look at `?grepl`. `df$b[grepl("lamp", df$a)] <- 0` — Ian Wesley, Aug 06 '18 at 19:42
Hi @akrun. I should have been more clear that the table I had included was the desired output. I've edited the question based on your input. — Pake, Aug 06 '18 at 20:22
@Ian Wesley Thank you for the direction. I think grepl may be the part of the cure. The references to another table for the top row and the references to the cell above have got me stuck. — Pake, Aug 06 '18 at 20:25
@Pake sorry I am having a very hard time understanding what you are trying to accomplish. Also it sounds like there is probably a better data structure to use. You seem to be approaching this as an Excel problem rather than as an R problem. However look at `?dplyr::lag` to lag to another row. — Ian Wesley, Aug 06 '18 at 20:32
Just fill in the results for conditions 1 and 2 using `grepl`, leaving everything else as `NA`. Then use `zoo::na.locf` to fill all the `NA`s with the previous non-missing value. — Gregor Thomas, Aug 06 '18 at 21:07
I think Gregor's suggestion is the best way to go, but it'd be easier to illustrate the details it if your example was easily reproducible (eg, by copy-paste into an r console). For guidance, see: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250 — Frank, Aug 06 '18 at 21:12

score 1 · Accepted Answer · answered Aug 06 '18 at 21:27

Reading in your data and setting all but the 1st row of columns b and c to NA:

dd = read.table(header = T, text = "    time   a           b      c
1   18:41  'b d cat'       1     0
2   18:43  'b d dog'       1     0 
3   18:47  'b d lamp'      0     0
4   18:51  'b s dog'       0     0
5   18:52  'b k cat'       0     0
6   18:57  'b c lion'      1     1
7   18:59  'b a dog'       1     1
8   19:23  'b w lamp'      0     1 
9   19:25  'b r cat'       0     1")

dd$b[-1] = NA
dd$c[-1] = NA

dd
#    time        a  b  c
# 1 18:41  b d cat  1  0
# 2 18:43  b d dog NA NA
# 3 18:47 b d lamp NA NA
# 4 18:51  b s dog NA NA
# 5 18:52  b k cat NA NA
# 6 18:57 b c lion NA NA
# 7 18:59  b a dog NA NA
# 8 19:23 b w lamp NA NA
# 9 19:25  b r cat NA NA

I think the above is the starting point (in the future it would be nice if you provide a copy/pasteable starting point as above).

We'll create a new column on_off that holds the shared logic looking for lamp and lion:

dd$on_off = NA
dd$on_off[c(FALSE, grepl(pattern = "lamp", x = dd$a[-1]))] = 0
dd$on_off[c(FALSE, grepl(pattern = "lion", x = dd$a[-1]))] = 1

Then we'll reference that column while searching the string for b or c by itself ("\\b" is a regex pattern for a word boundary, this keeps us from matching, e.g., the "c" in "cat"):

dd$b[-1] = ifelse(grepl("\\bb\\b", dd$a[-1]), dd$on_off[-1], NA)
dd$c[-1] = ifelse(grepl("\\bc\\b", dd$a[-1]), dd$on_off[-1], NA)

Lastly, we fill in the missing values with the previous observation:

dd$b = zoo::na.locf(dd$b)
dd$c = zoo::na.locf(dd$c)

dd
#    time        a b c on_off
# 1 18:41  b d cat 1 0     NA
# 2 18:43  b d dog 1 0     NA
# 3 18:47 b d lamp 0 0      0
# 4 18:51  b s dog 0 0     NA
# 5 18:52  b k cat 0 0     NA
# 6 18:57 b c lion 1 1      1
# 7 18:59  b a dog 1 1     NA
# 8 19:23 b w lamp 0 1      0
# 9 19:25  b r cat 0 1     NA

You can remove the on_off column with dd$on_off = NULL. I leave it in to make things a little clearer.

Thank you for this excellent answer! I added a pasteable table on your recommendation. Would this code be easily scaled for multiple columns? Say, if there were dozens of columns beyond b and c? — Pake, Aug 07 '18 at 12:59
We can scale fairly easily - it's nice to include that need in the original question as well. I'd recommend doing a simple for loop. Do `for (col in )` and replace the `dd$b` with `dd[[col]]` and the `grepl` pattern with `paste0("\\b", col, "\\b")`. I'm at work now, but this evening if you still need help I can work an example. — Gregor Thomas, Aug 07 '18 at 14:09

Conditionally changing cell value when string appears in a separate column in R: time series

1 Answers1