I am trying to reduce the size of my files by removing rows that contain no additional information. I observe that I have rows where the Bid and Ask price do not change from period to the next for the same ID. In this case I only want to keep the first observation.
Not sure how to add data from Excel file?
Please help write code that efficiently reduces the size of the data.frame by only keeping the first observation per id/time if the Bid/Ask price does not change. Efficiency is key since my files are >5GB.
I tried using distinct(data) but that does not work seeing as the time column does change. I want to specify that keep distinct bid/ask prices within id/time group.
Grouping by time and ID is also not an option since my dataset is far to large and this will result in too slow code.