0

I have data representing stock prices, with 1 minute bars. I need to delete the row corresponding to the first minute of each day and the following 29 rows. The first row of each day always has value >60 at the time_difference variable. If I write del<- df[which(df$time_difference>60),] , then df_new=anti_join(df, sel, by= "Time")I select the first row of each day. However, I need to remove the next 29 rows as well.

Here is a sample of the df, I also added a time_difference vector computed as difference between each row and the next row for the Time variable (not displayed here). Df file can be downloaded from here

Time Open High Low Close Volume Wap Gap Count 1 1536154200 234.61 234.95 234.57 234.76 302 234.600 0 31 2 1536154260 234.76 235.23 234.76 235.16 135 235.008 0 94 3 1536154320 235.09 235.33 234.88 235.33 121 235.010 0 109 4 1536154380 235.24 235.35 235.08 235.35 24 235.203 0 22 5 1536154440 235.27 235.47 235.22 235.42 62 235.340 0 35 6 1536154500 235.39 235.81 235.39 235.63 136 235.633 0 110

Davide Piffer
  • 113
  • 2
  • 12
  • Adding a minimum working example following the guidelines [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) will help you get the best answers. – mikebader Sep 07 '21 at 16:12
  • Is there a typo in the anti_join argument `sel`? Would it be the name of the data frame just created above, currently called `del`. – Pablo Adames Sep 10 '21 at 13:38

1 Answers1

1

My original answer only works on one set of rows at a time. Thanks for sharing your data. Below is an updated answer. Note that this is sensitive to your data being in chronological order as we are using row indices rather than the actual time!

dat <- read.csv("MSFT.3years.csv")

startofday <- which(dat$time_difference>60)

removerows <- unlist(Map(`:`, startofday, startofday+29))

dat_new <- dat[-removerows,]

inspired from here: Generate sequence between each element of 2 vectors

Skaqqs
  • 4,010
  • 1
  • 7
  • 21
  • Thanks. However, I get the following error: Error in row:(row + 29) : NA/NaN argument In addition: Warning messages: 1: In row:(row + 29) : numerical expression has 18 elements: only the first used 2: In row:(row + 29) : numerical expression has 18 elements: only the first used – Davide Piffer Sep 07 '21 at 16:44
  • I updated my answer based on the data you linked. – Skaqqs Sep 07 '21 at 17:25
  • Thanks. It works now. Luckily, the data is in chronological order. – Davide Piffer Sep 07 '21 at 18:36