How to count unique values in a data frame in R

Question

I am building a scheduler that calculates the number of hours each person works for a week. The dataframe looks like this:

>df
  Shift Monday Tuesday Wednesday Thursday Friday Saturday Sunday
1 09-12      a       c         a        c      b        b      b
2 12-15      b       d         b        d      a        a      e
3 15-18      c       e         c        e      d        e      f
4 18-21      d       f         e        f      e        f      a
5 21-24      e       a         d        d      c        d      d
6 24-03      f       b         f        e      a        b      b
7 03-06      a       c         a        a      b        a      e
8 06-09      b       d         b        f      d        e      f

Additionally, I would like to have people who serve Shift 24-03 to have 4 hours instead of 3 hours. So the result would look something like this:

name hours
a       30
b       34
c       32
d       31.5
e       34
f       33

refer **https://stackoverflow.com/questions/12840294/counting-unique-distinct-values-by-group-in-a-data-frame** — Night Programmer, Mar 22 '18 at 05:58
How did you get 31.5 hours for person `d` when every shift is either 3 or 4 hours? — lebelinoz, Mar 22 '18 at 06:08
Does the answer need to use `dplyr`? If not, then please remove that tag. — lebelinoz, Mar 22 '18 at 06:17

score 2 · Answer 1 · answered Mar 22 '18 at 06:10

We can gather into 'long' format, separate the 'shift' into numeric columns, then grouped by 'name', get the difference of the two columns and sum it

library(tidyverse)
gather(df, key, name, -Shift) %>%
     separate(Shift, into = c("Start", "End"), convert = TRUE) %>% 
     mutate(End = ifelse(End < Start, Start + End, End)) %>% 
     group_by(name) %>% 
     summarise(hours = sum(End - Start))

score 0 · Answer 2 · answered Mar 22 '18 at 06:12

If you make sure your columns are strings and not factors, you can unlist the week-day columns to get a plain vector and then use table to count

df <- read.table(text="Shift Monday Tuesday Wednesday Thursday Friday Saturday Sunday
1 09-12      a       c         a        c      b        b      b
2 12-15      b       d         b        d      a        a      e
3 15-18      c       e         c        e      d        e      f
4 18-21      d       f         e        f      e        f      a
5 21-24      e       a         d        d      c        d      d
6 24-03      f       b         f        e      a        b      b
7 03-06      a       c         a        a      b        a      e
8 06-09      b       d         b        f      d        e      f",
                 stringsAsFactors = FALSE)

plain_vec <- unlist(df[,2:8], use.names = FALSE)

This gets you to

> table(plain_vec)
plain_vec
 a  b  c  d  e  f 
11 11  6 10 10  8

To count hours, you can replicate rows as many times as you have hours adn adjust the 24-03 by adding an extra row for that.

> table(unlist(df[c(rep(c(1:5,7:8), each=3), rep(8,4)), 2:8], use.names=FALSE))

 a  b  c  d  e  f 
30 32 18 38 31 26

The use.names=FALSE is just something I usually do with unlist. You don't need it, but usually your code is much faster if vectors do not have to carry their names along with them.

score 0 · Accepted Answer · answered Mar 22 '18 at 06:13

I think gather from the tidyr package will shape the data into a form you want:

> df1 = df %>% tidyr::gather(key = "weekday", value = "name", -Shift)
   Shift   weekday name
1  09-12    Monday    a
2  12-15    Monday    b
3  15-18    Monday    c
4  18-21    Monday    d
...

Then you can add on a hours column using mutate and ifelse:

df2 = df1 %>% mutate(hours = ifelse(Shift == "24-03", 4, 3))

And the answer is a simple group_by / summarise combo:

answer = df2 %>% group_by(name) %>% summarise(hours = sum(hours))

The answer dataframe will look like this:

name hours
-----------
a    34
b    36
c    18
d    30
e    31
f    26

This isn't quite what you wanted, but I think there's something fishy about your sample data. How can someone have 31.5 hours?

@levelinoz I was able to get the result you got for answer a couple days ago, but I am getting just the sum of hours column without name now... Has there been any changes in the tidyr package? — user9532692, Mar 26 '18 at 10:20
@user9532692 I don't think so. This might require you to ask a new question — lebelinoz, Mar 26 '18 at 21:51

How to count unique values in a data frame in R

3 Answers3