Data:
DB <- data.frame(orderID = c(1,2,3,4,4,5,6,6,7,8),
orderDate = c("1.1.12","1.1.12","1.1.12","13.1.12","13.1.12","12.1.12","10.1.12","10.1.12","21.1.12","24.1.12"),
itemID = c(2,3,2,5,12,4,2,3,1,5),
customerID = c(1, 2, 3, 1, 1, 3, 2, 2, 1, 1),
itemPrice = c(9.99, 14.99, 9.99, 19.99, 29.99, 4.99, 9.99, 14.99, 49.99, 19.99)
orderItemStatus = c(sold, sold, sold, refunded, sold, refunded, sold, refunded, sold, refunded))
Expected outcome:
DB <- data.frame(orderID = c(1,2,3,4,6,7),
orderDate = c("1.1.12","1.1.12","1.1.12","13.1.12","10.1.12","21.1.12"),
itemID = c(2,3,2,12,2,1),
customerID = c(1, 2, 3, 1, 2, 1,),
itemPrice = c(9.99, 14.99, 9.99, 29.99, 9.99, 49.99,)
orderItemStatus = c(sold, sold, sold, sold, sold, sold)
For Understanding:
The orderID
is continuous. Products ordered from the same customerID
at the same day get the same orderID
. When the same customer orders products at another day he/she it´s a new orderID
.
I want to delete all orders with orderItemStatus = refunded. How can I do this? (I think it´s quit simple and I find Removing specific rows from a dataframe: but I don´t understand how it works - so plz help me :( )
-> The original data has about 500k rows: so plz give a solution which needs only little perfomance...
Thanks a lot for your support!