This is a port of Read range of files in pySpark for spark.
I have time series data in a data frame that looks like this:
Index Time Value_A Value_B
0 1 A A
1 2 A A
2 2 B A
3 3 A A
4 5 A A
I want to drop duplicate in the Value_A and Value_B columns such that duplicates are only dropped until a different pattern is encountered. The result for this sample data should be:
Index Time Value_A Value_B
0 1 A A
2 2 B A
3 3 A A