0

I have a data frame below where I want to highlight for each day which employees were outliers in terms of time spent.

Emp_ID 3 is consistently an outlier on 1st , 2nd and 3rd of January amongst all employees. In my actual dataset there are thousands of employees altogether.

How to show them visually in terms of some plot?

df <- data.frame(date = as.Date(c("2020-01-01","2020-01-01","2020-01-01","2020-01-01",
                              "2020-01-02","2020-01-02","2020-01-02","2020-01-02",
                              "2020-01-03","2020-01-03","2020-01-03","2020-01-03")),
             Emp_Id = c(1,2,3,4,1,2,3,4,1,2,3,4),
             time = c(5,2,80,3,3,1,90,80,5,6,75,7))

       date Emp_Id time
2020-01-01      1    5
2020-01-01      2    2
2020-01-01      3   80
2020-01-01      4    3
2020-01-02      1    3
2020-01-02      2    1
2020-01-02      3   90
2020-01-02      4   80
2020-01-03      1    5
2020-01-03      2    6
2020-01-03      3   75
2020-01-03      4    7
joy_1379
  • 487
  • 3
  • 17

1 Answers1

1

This answer will depend on your chosen metric, and how you want to define it. Here is an example that will show you employees who use more than twice the mean time. You can build on this to add various degrees of metrics, e.g. more than the mean time, more than twice the mean time, etc. The important thing is to choose a meaningful metric.

In the example, only outliers are labeled, and a horizontal line is shown as to where the limit is to satisfy the condition for outlier.

# Example data from question
df <- data.frame(date = as.Date(c("2020-01-01","2020-01-01","2020-01-01","2020-01-01",
                                  "2020-01-02","2020-01-02","2020-01-02","2020-01-02",
                                  "2020-01-03","2020-01-03","2020-01-03","2020-01-03")),
                 Emp_Id = c(1,2,3,4,1,2,3,4,1,2,3,4),
                 time = c(5,2,80,3,3,1,90,80,5,6,75,7))

library(dplyr)
library(ggplot2)

# Create our data with chosen metric for outlier
emp_data = df %>% 
  mutate(date = as.factor(date)) %>% 
  group_by(date) %>% 
  mutate(metric = mean(time) * 2) %>% 
  mutate(outlier = ifelse(time > metric, TRUE, FALSE))

# Visualize it
  ggplot(data = emp_data, aes(x = as.factor(date), y = time, label = Emp_Id, col = outlier, group = date)) +
  geom_point() +
  geom_text(data = filter(emp_data, outlier == TRUE), aes(label=Emp_Id),hjust=2, vjust=0) +
  facet_wrap(~date, scales = "free") +
  geom_hline(aes(yintercept = metric)) +
  labs(x = "Date", y = "Time", col = "Outlier") +
  theme_classic()

Created on 2021-04-09 by the reprex package (v0.3.0)

mhovd
  • 3,724
  • 2
  • 21
  • 47