1

I have a data set, each line representing a "service visit" for customers. A customer might have between 0 and 5 service calls. If there isn't a service call for someone, the columns associated with a service call would all be empty.

+--------------+-------------------+-------------------+------------------------+---------------------+
| Project Name |   Customer Name   | Service Call.Name | Service Call Date Time | Service Call Status |
+--------------+-------------------+-------------------+------------------------+---------------------+
| OO-99999     | A                 | SC-001762         | 3/21/2022 7:00:00 PM   | Completed           |
| OO-99999     | A                 | SC-002323         | null                   | Completed           |
| OO-99999     | A                 | SC-002357         | 10/3/2022 7:00:00 PM   | 2nd Visit Scheduled |
| OO-88888     | B                 | SC-001260         | 2/1/2022 8:00:00 PM    | Completed           |
| OO-88888     | B                 | SC-002938         | 8/25/2022 7:00:00 PM   | Scheduled           |
| OO-55555     | C                 | SC-000957         | 12/27/2021 8:00:00 PM  | Completed           |
| OO-55555     | C                 | SC-001418         | 2/7/2022 4:30:00 PM    | Completed           |
| OO-55555     | C                 | SC-003007         | null                   | null                |
| OO-66666     | D                 | SC-001626         | null                   | No Longer Required  |
| OO-66666     | D                 | SC-002329         | 6/9/2022 7:00:00 PM    | Completed           |
| OO-66666     | D                 | SC-002538         | null                   | Completed           |
| OO-66666     | D                 | SC-002932         | null                   | Call Reviewed       |
| OO-66666     | D                 | SC-003350         | 9/29/2022 7:00:00 PM   | Scheduled           |
| OO-11111     | F | null              | null                   | null                |
+--------------+-------------------+-------------------+------------------------+---------------------+

My goal is to filter out duplicates. I only want one row per customer, but I want to keep a specific row. A duplicate only appears if someone has multiple service calls.

If someone has a service call (Service Call.Name not equal to null), and one of those has a service call status of something OTHER than "Completed" or "Not required", I want to keep that row. So for Customer A, I want the third row since the service call status is not "completed" or "Not required".

If someone has multiple service calls, like customer , and they are all "completed" or "Not required". I don't care which one I keep, as long as I only keep one.

If someone has one service call or no service call, there will be no duplicate of that person, so I want to keep that row.

EDIT

There were cases of duplicates I didn't realize I had, I've edited the data to show them.

For someone with more than one open service call like customer E, I only want to keep one of them. If there is a date for both, I want the latest date of the two. If one has a date and the other doesn't, i want the one with a date. If neither have a date, i don't care which is kept, but i only want one.

I am working in Power BI, but I have access to R and think that might be easier.

1 Answers1

3

Here is a solution. duplicated will give what rows to keep by customer name and another logical index, created with %in%, the rows to keep by status.

 dat <- read.table(text = '+--------------+---------------+-------------------+------------------------+---------------------+
| Project Name | Customer Name | Service Call.Name | Service Call Date Time | Service Call Status |
+--------------+---------------+-------------------+------------------------+---------------------+
| OO-99999     | A             | SC-001762         | 3/21/2022 7:00:00 PM   | Completed           |
| OO-99999     | A             | SC-002323         | null                   | Completed           |
| OO-99999     | A             | SC-002357         | 10/3/2022 7:00:00 PM   | 2nd Visit Scheduled |
| OO-88888     | B             | SC-001260         | 2/1/2022 8:00:00 PM    | Completed           |
| OO-88888     | B             | SC-002938         | 8/25/2022 7:00:00 PM   | Scheduled           |
| OO-55555     | C             | SC-000957         | 12/27/2021 8:00:00 PM  | Completed           |
| OO-55555     | C             | SC-001418         | 2/7/2022 4:30:00 PM    | Completed           |
| OO-55555     | C             | SC-003007         | null                   | null                |
| OO-11111     | D             | null              | null                   | null                |
+--------------+---------------+-------------------+------------------------+---------------------+
', header = TRUE, sep = "|", comment.char = "+", strip.white = TRUE, check.names = FALSE)
dat <- dat[-c(1, ncol(dat))]
  
not_wanted <- c("Completed", "Not required")
  
i <- dat[['Service Call Status']] %in% not_wanted
i <- ave(i, dat[['Customer Name']], FUN = \(k) {
  if(all(k)) k[1] <- FALSE
  !k
})
result <- dat[i,]
j <- ave(result[['Service Call Status']], result[['Customer Name']], FUN = duplicated)
result <- result[!as.logical(j), ]
result
#>   Project Name Customer Name Service Call.Name Service Call Date Time Service Call Status
#> 3     OO-99999             A         SC-002357   10/3/2022 7:00:00 PM 2nd Visit Scheduled
#> 5     OO-88888             B         SC-002938   8/25/2022 7:00:00 PM           Scheduled
#> 8     OO-55555             C         SC-003007                   null                null
#> 9     OO-11111             D              null                   null                null

Created on 2022-10-26 with reprex v2.0.2

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • I only want one row per customer, here you have two rows for 3 of the customers. If my initial post didnt make sense please let me know and i will clarify. – Jacob Nordstrom Oct 26 '22 at 14:13
  • 1
    @JacobNordstrom You're right, I misread the question. I thought that you wanted only one `"Completed"` or `"not required"` per customer plus one of the others. Will edit, give me a moment. – Rui Barradas Oct 26 '22 at 14:25
  • 1
    @JacobNordstrom Done, see now. – Rui Barradas Oct 26 '22 at 15:11
  • This worked great, so great it made me realize i had other use cases that i didnt know i had. I edited the post so those are clear now. Any help would be appreciated. – Jacob Nordstrom Oct 26 '22 at 17:56
  • I am getting an error for some reason. "Error in split.default(x, g) : first argument must be a vector. " – Jacob Nordstrom Oct 26 '22 at 23:34