I've been trying to solve this problem for two days and I'm tearing my hair out.
I have a dataset with nearly 15 million points. I have a few days of data points that are artifacts that I need to remove from the dataset.
I know the syntax for deleting rows that I need to delete from my dataset:
DataNoArtifacts <- Data[-(5039761:5041201), ]
This code has worked for me in the past and continues to work for me. My problem is FINDING the actual values that I need to delete so that I can get the row numbers, or range of row numbers, to put in the code.
When I try to filter the dataset to find the exact date and minutes I need to filter out, I can easily find the times. However, filtering the dataset to get them assigns them new row numbers. I need to be able to filter the data and see the original row numbers but cannot.
So I tried to solve this by scrolling through my 15,000,000 row dataset to find the rows manually since that seems to be the only option. The problem is, if I scroll more than one click or so at a time, the dataset will jump up/down a few thousand rows, which makes it near impossible for me to find the row number for any specific day, MUCH less the exact hour and minute I need to find. If I finally get within the range of a couple of weeks from the data point I need to find and delete, the only way to ensure I'll be able to find the datapoint I need is to click one click at a time... through 2 weeks or so of data that is broken up by the minute.
I have some days with data I need to delete with a huge range (example: 12/11/2003 has a few ranges of many hours at a time with equipment error that I need to delete), and some of the days with data I need to delete have only a few randomly interspersed minutes within the entire day that are problematic (example: 3/10/2014 with constant wind speeds between 0 and 3 m/s, with about 40 random blips in the data of the 1,440 minutes throughout the day that are anomalies like 20 m/s).
Long story short: I know how to delete the values I need to delete. But for the life of me R will not cooperate to help me find the rows.
Unfortunately, the data points I need to delete are not exclusively days/hours/minutes above or below a certain value I can filter out. It's the times AROUND specific points that indicated artifacts that I need to delete.