0

The dataset is Netflix stock price with 8 variables (8 columns) So I am picking the 3 columns I need by using

select("date", "open", "close")

  date        open close
   <date>     <dbl> <dbl>
 1 2011-01-03  25    25.5
 2 2011-01-04  25.9  25.9
 3 2011-01-05  25.9  25.7
 4 2011-01-06  25.2  25.4
 5 2011-01-07  25.5  25.6
 6 2011-01-10  25.7  26.8
 7 2011-01-11  27.1  26.7
 8 2011-01-12  26.9  27.0
 9 2011-01-13  26.9  27.4
10 2011-01-14  27.3  27.4
  1. I wanna pick only the rows where the opening price is higher than previous days closing price
  2. And also, the closing price has to be higher than the opening for that same day So for this dataset only 3 rows are qualifying: Jan 7th, Jan 10th and Jan 12th. If somebody could help me to understand how to code this, I would really appreciate.
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Maiki
  • 1
  • 1
  • It would be helpful to allow others to reproduce your data easily. See https://stackoverflow.com/q/5963269/6607497 for suggestions. – U. Windl Jan 15 '21 at 07:01

1 Answers1

0

Using dplyr, you can do :

library(dplyr)
result <- df %>% filter(open > lag(close), close > open)
result
#        date open close
#1 2011-01-07 25.5  25.6
#2 2011-01-10 25.7  26.8
#3 2011-01-12 26.9  27.0

And the same in base R and data.table :

#Base R
subset(df, open > c(NA, close[-nrow(df)]) & close > open)

#data.table
library(data.table)
setDT(df)[open > shift(close) & close > open]

data

df <- structure(list(date = structure(c(14977, 14978, 14979, 14980, 
14981, 14984, 14985, 14986, 14987, 14988), class = "Date"), open = c(25, 
25.9, 25.9, 25.2, 25.5, 25.7, 27.1, 26.9, 26.9, 27.3), close = c(25.5, 
25.9, 25.7, 25.4, 25.6, 26.8, 26.7, 27, 27.4, 27.4)), row.names = c(NA, 
-10L), class = "data.frame")
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213