I have a data set, each line representing a "service visit" for customers. A customer might have between 0 and 5 service calls. If there isn't a service call for someone, the columns associated with a service call would all be empty.
+--------------+-------------------+-------------------+------------------------+---------------------+
| Project Name | Customer Name | Service Call.Name | Service Call Date Time | Service Call Status |
+--------------+-------------------+-------------------+------------------------+---------------------+
| OO-99999 | A | SC-001762 | 3/21/2022 7:00:00 PM | Completed |
| OO-99999 | A | SC-002323 | null | Completed |
| OO-99999 | A | SC-002357 | 10/3/2022 7:00:00 PM | 2nd Visit Scheduled |
| OO-88888 | B | SC-001260 | 2/1/2022 8:00:00 PM | Completed |
| OO-88888 | B | SC-002938 | 8/25/2022 7:00:00 PM | Scheduled |
| OO-55555 | C | SC-000957 | 12/27/2021 8:00:00 PM | Completed |
| OO-55555 | C | SC-001418 | 2/7/2022 4:30:00 PM | Completed |
| OO-55555 | C | SC-003007 | null | null |
| OO-66666 | D | SC-001626 | null | No Longer Required |
| OO-66666 | D | SC-002329 | 6/9/2022 7:00:00 PM | Completed |
| OO-66666 | D | SC-002538 | null | Completed |
| OO-66666 | D | SC-002932 | null | Call Reviewed |
| OO-66666 | D | SC-003350 | 9/29/2022 7:00:00 PM | Scheduled |
| OO-11111 | F | null | null | null |
+--------------+-------------------+-------------------+------------------------+---------------------+
My goal is to filter out duplicates. I only want one row per customer, but I want to keep a specific row. A duplicate only appears if someone has multiple service calls.
If someone has a service call (Service Call.Name
not equal to null), and one of those has a service call status
of something OTHER than "Completed" or "Not required", I want to keep that row. So for Customer A, I want the third row since the service call status is not "completed" or "Not required".
If someone has multiple service calls, like customer , and they are all "completed" or "Not required". I don't care which one I keep, as long as I only keep one.
If someone has one service call or no service call, there will be no duplicate of that person, so I want to keep that row.
EDIT
There were cases of duplicates I didn't realize I had, I've edited the data to show them.
For someone with more than one open service call like customer E, I only want to keep one of them. If there is a date for both, I want the latest date of the two. If one has a date and the other doesn't, i want the one with a date. If neither have a date, i don't care which is kept, but i only want one.
I am working in Power BI, but I have access to R and think that might be easier.