3

I have a data frame that looks like this sx16 data frame:

enter image description here

Incase the link doesnt work:

The data frame is called sx16

It has column names: Date, Open, High, Low, Settle

I want to add a column called up_period that prints a 1 if the below calc is positive and a 0 if the below calc is negative:

sx16$Settle[ 1: nrow(sx16)] - sx16$Settle[ 2: nrow(sx16)]

Of course, this produces an error as the new list is shorter than the original sx16.

I have tried to wrap rbind.fill around it like so:

sx16$up_period <- rbind.fill(sx16$Settle[ 1: nrow(sx16)] - sx16$Settle[ 2: nrow(sx16)])

But this produces the following error:

Warning message: In sx16$Settle[1:nrow(sx16)] - sx16$Settle[2:nrow(sx16)] : longer object length is not a multiple of shorter object length

Of course, that is exactly what I thought rbind.fill would solve. Here is where I am stuck. Once I get this, I can add a simple if-else to do the 1 and 0, but I cannot figure out how to add this shorter column to my data frame.

DimaSan
  • 12,264
  • 11
  • 65
  • 75
John
  • 41
  • 3
  • 1
    Welcome to SO. Please have a read at [how to ask a question](http://stackoverflow.com/help/how-to-ask) and [how to make a reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Sotos Oct 03 '16 at 09:53
  • using sample data: iris$Sepal.Length[1:(nrow(iris)-1)]-iris$Sepal.Length[2:nrow(iris)] will handle all values except the last one – Oli Oct 03 '16 at 10:01
  • @OliPaul and how are they going to bind that to the data frame? It has one row less. Also all the signs come out opposite (Try `iris$Sepal.Length - c(NA, iris$Sepal.Length[1:nrow(iris) - 1])`) – Sotos Oct 03 '16 at 10:08
  • 1
    Don't you mean `iris$Sepal.Length - c(iris$Sepal.Length[2:nrow(iris)], NA)` – Oli Oct 03 '16 at 10:35

4 Answers4

2

Try this (last up_period is not defined):

sx16$up_period <- sx16$Settle - c(sx16$Settle[-1],NA)
Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63
  • This worked perfectly. The ",NA" part was something I did not understand. Thanks very much! – John Oct 10 '16 at 04:20
  • The last element is not available for the lagged sequence, NA needed to keep the length of the sequence the same. – Sandipan Dey Oct 10 '16 at 05:14
1

You can use lead from the dplyr package:

library(dplyr)
result <- sx16 %>% mutate(up_period=as.numeric((Settle-lead(Settle,default=NA)) > 0))
##        Date   Open   High    Low Settle up_period
##1 2016-09-30 950.00 958.50 943.00 954.00         1
##2 2016-09-29 947.00 957.25 946.00 950.25         1
##3 2016-09-28 951.75 955.75 944.50 945.50         0
##4 2016-09-27 946.75 953.50 934.00 952.50         1
##5 2016-09-26 951.50 960.25 943.75 945.25         0
##6 2016-09-23 975.00 976.25 952.50 955.00        NA

Here, we explicitly set the default parameter for lead to NA to fill in the value at the end to show that we can set this to another value such as the last value if we want. Note that there is also no need to use an if-else as we can convert the boolean to 1,0 using as.numeric.

The dput for your data is:

sx16 <- structure(list(Date = structure(c(17074, 17073, 17072, 17071, 
17070, 17067), class = "Date"), Open = c(950, 947, 951.75, 946.75, 
951.5, 975), High = c(958.5, 957.25, 955.75, 953.5, 960.25, 976.25
), Low = c(943, 946, 944.5, 934, 943.75, 952.5), Settle = c(954, 
950.25, 945.5, 952.5, 945.25, 955)), .Names = c("Date", "Open", 
"High", "Low", "Settle"), row.names = c(NA, -6L), class = "data.frame")
aichao
  • 7,375
  • 3
  • 16
  • 18
  • This is an excellent solution. I thought dplyr might be my solution, but I am not so familiar with it. I shall have to remedy that. The as.numeric is an elegant solution to the if-else. Thank you. – John Oct 10 '16 at 04:24
1

I'm surprised nobody mentioned diff yet. diff(sx16$Settle) is the equivalent of sx16$Settle[2:nrow(sx16)] - sx16$Settle[1:(nrow(sx16)-1)]. So the following would work for you:

sx16$up_period <- c(ifelse(diff(sx16$Settle)<0, 1, 0), NA)
plannapus
  • 18,529
  • 4
  • 72
  • 94
  • I was trying to use diff, but I was running into a few problems. The main one being it was calculating the change wrong in that it was showing a change of +7 from the first row to the second, instead of the other way around. Your solution clearly works flawlessly though, so I am not sure what I was doing wrong. I will have to go back and look. Thank you. – John Oct 10 '16 at 04:29
0

I'll use the iris data set:

x <- iris 
dummy <- x$Sepal.Length             #repeat column again but rename dummy
dummy[length(dummy)+1]=0            #add a value of 0 to the end for the day thats not happened yet
dummy <- dummy[2:length(dummy)]     #translate the column to match the original for calculation
x <- cbind(x,dummy)                 #add the column to the data
x$up <- x$Sepal.Length-x$dummy      #new calculated column
x$dummy <- NULL                     #remove dummy

So essentially, I added your column again, translated it down one position and then calculated using that dummy column.

Oli
  • 532
  • 1
  • 5
  • 26