Use of lag function

Question

I try to reformulate my question.

I have two array of data (in the worksheet example column E and F). I want creare a new array with these rules:

If E < -0.03 then 0 else if F > 0.03 then 1 else carry over previous value of the new array in formation.

In my example worksheet this is all done in one column (column H). I want to create in R the same array (column H). My problem is that in R you can't call an array before it is completely created.

I'm not able to think a way to circumnavigate this problem. I'm asking you which technic you would use to create in R the array in column H

https://dl.dropboxusercontent.com/u/102669/example.xlsx

`iif` seems to be from https://github.com/systematicinvestor/SIT/blob/master/R/utils.r. What is `class(signal)`? Possibly the lagging does not work because it is not a time series (xts for example). — thie1e, Mar 01 '15 at 19:12
all objects are xts class. i used ifelse function but got the same results. Seems that lag function doesn't do its work when used in a recursive expression. But i'm not sure about that — Fryc, Mar 01 '15 at 19:20
Please provide a [reproducible example](http://stackoverflow.com/q/5963269/271616), and be very explicit about what you expect and how the results you get differ from what you expect. Otherwise, all we can do is guess, and that's not productive for anyone. — Joshua Ulrich, Mar 01 '15 at 19:31
@Fryc your expression isn't recursive at all. Perhaps you just need to do this with two separate lines and an intermediate variable? — Gregor Thomas, Mar 01 '15 at 20:28
@Gregor can you give me an example in my new edited question? How would be possible to have zero (previous value of the new signal array) and not 11? — Fryc, Mar 01 '15 at 20:40
You need to explain in words the logic you are trying to implement. Your code write now says "if `sig1` is less than 3, then 0..." which makes the first two 0, "otherwise, if `sig2` is greater than 8, then 0.2", which makes that last two 0.2, "otherwise, use the (lagged) original data" which is all 11, so the one left over is 11. Since you don't describe your logic in words, it's not at all clear *how* you want 0 to be determined. Do you want `sig1 <= 3`, or do you want to lag the output from the `sig1` test? — Gregor Thomas, Mar 02 '15 at 06:05
i want zero because zero it is the previous value of the new signal vector (0,0,11,0.2,0.2). We have two signal vector, the old one (11,11,11,11,11) and the new one we are creating: signal = ifelse...etc. In other words, if first condition then zero, if second condition then 0.2 otherwise the previous value of the new rising signal, whatever it is. — Fryc, Mar 02 '15 at 08:07

Gregor Thomas · Accepted Answer · 2015-03-06T22:30:43.490

1

Okay, your last comment gets at some of the confusion:

We have two signal vector, the old one (11,11,11,11,11) and the new one we are creating: signal = ifelse...etc.

In R, you can't reference a new variable as it is being created, you have to finish creating it first.

That said you still haven't explained, in words, what you want to do so it's very difficult to try to correct your code. I understand exactly what your code does, and why---but it's very difficult to know what you actually want since you haven't explained your logic. (This probably explains the downvotes on your question.) So this is my best guess.

## The set-up
signal = c(11, 11, 11, 11, 11) 
sig1 = c(1, 2, 3, 4, 5) 
sig2 = c(6, 7, 8, 9, 10)

## Let's get a temp variable, the thing we want to lag
## (again, this is a guess)
(sig.temp = ifelse(sig1 < 3, 0, signal))
# [1]  0  0 11 11 11

(new.signal = ifelse(sig1 < 3, 0, ifelse(sig2 > 8, 0.2, lag(sig.temp))))
[1] 0.0 0.0 0.0 0.2 0.2

Edits:

# Another way, this time doing both comparisons before the lag
sig.temp2 = ifelse(sig1 < 3, 0, ifelse(sig2 > 8, 0.2, signal))
new.signal = ifelse(sig1 < 3 | sig2 > 8, sig.temp2, lag(sig.temp2))
# [1] 0.0 0.0 0.0 0.2 0.2

The difference between R and Excel in this is that Excel will do things one-at-a-time, and auto-update based on changes. R will never auto-update. For example, in R

x = 1
y = x + 1
# y is 2
x = 5
y 
# [1] 2
# y is still 2

However, in Excel, if you set B1 = A1 + 1, then that relationship will be maintained. Because R doesn't auto-update, and R doesn't like to do things one-at-a-time (it creates a vector all at once, not one row at a time), you need a temp variable.

More edits

Okay, looking more carefully at your spreadsheet, column D isn't used at all. Just like the c(11, 11, 11, ...) in your original question wasn't used at all. The only columns that matter are sig1 and `sig2, that is columns E and F. Here is the relevant data from Excel, rows 14-36:

col_e = c(14.286, 13.333, 12.5, 11.765, 8.333, 5.263, 7.692, 7.5, -4.762, 
          -2.326, -7.5, -4.762, 2.703, -7.5, 2.632, 7.027, 0, -1.768, -1.026, 
          -4.37, -3.109, 2.043, -0.588) / 100

col_f = c(6.67, 6.25, 5.88, 5.56, 2.63, 2.56, 5, 2.38, -6.98, 5, -11.9, 
          8.11, -5, -2.63, 5.41, 1.54, -1.52, -0.26, -0.77, -3.63, 0.54, 
          1.5, -2.05) / 100

Along with your desired result:

desired_result = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 
                   1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L)

Now let's code up your logic. For the exception case, we'll fill in a missing value:

col_g = ifelse(col_e < -.03, 0, ifelse(col_f > 0.04, 1, NA))

We then want to fill in the missing values (NAs) with the previous non-missing value. This is done nicely with zoo::na.locf() (stands for Last Observation Carried Forward):

library(zoo)
col_g = na.locf(col_g)

Does it match Excel?

all(na.locf(col_g) == desired_result)
# [1] TRUE

Yes.

If you want to do this in one line, you can nest the statements:

col_g = na.locf(ifelse(col_e < -.03, 0, ifelse(col_f > 0.04, 1, NA)))

Now that you gave your full code...

I called your output column "desired" in Excel, and read your data into R. Works just fine, all 3367 rows:

dat = read.table("clipboard", header = T)

result = zoo::na.locf(ifelse(dat$lambda < -8, 0, ifelse(dat$omega > 6, 1, NA)))
all(result == dat$desired)
# [1] TRUE

edited Mar 06 '15 at 22:30

answered Mar 02 '15 at 23:01

Gregor Thomas

136,190
20
167
294

Thanks Gregor. I'm terribly sorry for my incapacity to comunicate in a correct manner. Which is the consequence of down reputation? I didn't even know of that. Ok i understand now the logic of a temp variable. Thanks for showing me – Fryc Mar 03 '15 at 11:27
@Fryc it's just something to work on in the future. If you spend some time on Stack Overflow, read other questions, you will see what makes a good question. – Gregor Thomas Mar 03 '15 at 16:32
thanks for advise Gregor. In all this topic my simple question was: how to reference an array as it is being created? You said me that it is not possible. My hope was that it would be possible in someway, but it is not. But this is a big problem because after hour of thought i'm not able to figure out how to solve my problem. I know that it's not compliance but i don't know how to communicate in a clear way my problem. Please don't give other down reputation. I'm not able to think a way to do in R what in a worksheet i do in a single array (column H). – Fryc Mar 03 '15 at 21:43
I'm terrible sorry for an external link, but i really don't know how to express my programming doubt. https://dl.dropboxusercontent.com/u/102669/example.xlsx – Fryc Mar 03 '15 at 21:44
@Fryc You don't have a single column in Excel, you have two intermediate columns as well, E and F. You should make R objects that correspond to columns E and F, then column H is easy. If this doesn't make sense, you should ask a new question rather than continue this old question in comments. – Gregor Thomas Mar 04 '15 at 01:42
No Gregor, columns E and F are not intermediate columns but simply input. In fact i can make R object for them but the problem remains the same. The problem is that column H call itself and i don't know to translate this in R. Do you think i should create a new question? help me to be compliance to the rules of stack overflow – Fryc Mar 04 '15 at 06:15
Yes. Create a new question and describe **in words**, not just code, what you are trying to do. – Gregor Thomas Mar 04 '15 at 17:01
@Fryc But also I'm surpised that you say all 3 columns in Excel are inputs, but you seem to have only 1 column (vector) in R of input. Maybe if you use words to describe the calculation it will clear things up. – Gregor Thomas Mar 04 '15 at 17:33
@Fryc no, please do not change a question that's already been answered. Please ask a new question. – Gregor Thomas Mar 04 '15 at 20:56
thanks Gregor for your support. Of course your solution is not even near what the array H (in excel) does. If was so simple.... The problem arise from signal = c(11, 11, 11, 11, 11) . It doesn't exist, it was only an example. The reality of my problem is all in excel. For now i quit with this problem. In the future i will try again with a new question. – Fryc Mar 05 '15 at 21:07
@Fryc using the actual data from Excel, I have duplicated your column H. I'm confused now why you ever mention column D at all for this function. Column H depends only on Columns E and F. I used only the rows where you had a result for H, rows 14 to 36. – Gregor Thomas Mar 05 '15 at 22:08
You're right, columns C and D do not matter. Column H depends only on E and F. You duplicated column H only because in my example.xlsx file there is a small sample. So you can't see how more complicated is a simple recursive recall to the same array in formation. For your curiosity i link the complete excel, example2.xlsx. You can easily see how your R code can't reproduce H column https://dl.dropboxusercontent.com/u/102669/Example2.xlsx – Fryc Mar 06 '15 at 20:40
@Fryc I easily see that my R code works just fine to exactly match the H column. – Gregor Thomas Mar 06 '15 at 21:44
(see more the most recent additions) – Gregor Thomas Mar 06 '15 at 22:25
i don't have words to thank you Gregor. na.locf is the "trick" i was searching for. Thank you very much! – Fryc Mar 07 '15 at 07:51

Use of lag function

1 Answers1

Edits:

More edits

Now that you gave your full code...