-1

Data:

DB <- data.frame(orderID = c(1,2,3,4,4,5,6,6,7,8),    
orderDate = c("1.1.12","1.1.12","1.1.12","13.1.12","13.1.12","12.1.12","10.1.12","10.1.12","21.1.12","24.1.12"),
itemID = c(2,3,2,5,12,4,2,3,1,5),   
customerID = c(1, 2, 3, 1, 1, 3, 2, 2, 1, 1),
itemPrice = c(9.99, 14.99, 9.99, 19.99, 29.99, 4.99, 9.99, 14.99, 49.99, 19.99)
orderItemStatus = c(sold, sold, sold, refunded, sold, refunded, sold, refunded, sold, refunded))

Expected outcome:

DB <- data.frame(orderID = c(1,2,3,4,6,7),    
orderDate = c("1.1.12","1.1.12","1.1.12","13.1.12","10.1.12","21.1.12"),
itemID = c(2,3,2,12,2,1),   
customerID = c(1, 2, 3, 1, 2, 1,),
itemPrice = c(9.99, 14.99, 9.99, 29.99, 9.99, 49.99,)
orderItemStatus = c(sold, sold, sold, sold, sold, sold)

For Understanding:

The orderID is continuous. Products ordered from the same customerID at the same day get the same orderID. When the same customer orders products at another day he/she it´s a new orderID.

I want to delete all orders with orderItemStatus = refunded. How can I do this? (I think it´s quit simple and I find Removing specific rows from a dataframe: but I don´t understand how it works - so plz help me :( )

-> The original data has about 500k rows: so plz give a solution which needs only little perfomance...

Thanks a lot for your support!

Community
  • 1
  • 1
AbsoluteBeginner
  • 485
  • 4
  • 13

1 Answers1

0

The following code should do the job:

DB_new <- DB[-which(DB$orderItemStatus == "refunded"), ]

which gives you the indices fulfilling the comparison. E.g. with DB[-c(1,5,10),] you can remove items 1, 5 and 10. You could also do it in two steps:

indices_to_remove <- which(DB$orderItemStatus == "refunded")
DB_new <- DB[-indices_to_remove, ]

The other way suggested by @rosscova in the comments is to find the desired indices and assign them to the result.

Phann
  • 1,283
  • 16
  • 25