1

I am building a scheduler that calculates the number of hours each person works for a week. The dataframe looks like this:

>df
  Shift Monday Tuesday Wednesday Thursday Friday Saturday Sunday
1 09-12      a       c         a        c      b        b      b
2 12-15      b       d         b        d      a        a      e
3 15-18      c       e         c        e      d        e      f
4 18-21      d       f         e        f      e        f      a
5 21-24      e       a         d        d      c        d      d
6 24-03      f       b         f        e      a        b      b
7 03-06      a       c         a        a      b        a      e
8 06-09      b       d         b        f      d        e      f

Additionally, I would like to have people who serve Shift 24-03 to have 4 hours instead of 3 hours. So the result would look something like this:

name hours
a       30
b       34
c       32
d       31.5
e       34
f       33
user9532692
  • 584
  • 7
  • 28

3 Answers3

2

We can gather into 'long' format, separate the 'shift' into numeric columns, then grouped by 'name', get the difference of the two columns and sum it

library(tidyverse)
gather(df, key, name, -Shift) %>%
     separate(Shift, into = c("Start", "End"), convert = TRUE) %>% 
     mutate(End = ifelse(End < Start, Start + End, End)) %>% 
     group_by(name) %>% 
     summarise(hours = sum(End - Start))
akrun
  • 874,273
  • 37
  • 540
  • 662
0

If you make sure your columns are strings and not factors, you can unlist the week-day columns to get a plain vector and then use table to count

df <- read.table(text="Shift Monday Tuesday Wednesday Thursday Friday Saturday Sunday
1 09-12      a       c         a        c      b        b      b
2 12-15      b       d         b        d      a        a      e
3 15-18      c       e         c        e      d        e      f
4 18-21      d       f         e        f      e        f      a
5 21-24      e       a         d        d      c        d      d
6 24-03      f       b         f        e      a        b      b
7 03-06      a       c         a        a      b        a      e
8 06-09      b       d         b        f      d        e      f",
                 stringsAsFactors = FALSE)

plain_vec <- unlist(df[,2:8], use.names = FALSE)

This gets you to

> table(plain_vec)
plain_vec
 a  b  c  d  e  f 
11 11  6 10 10  8 

To count hours, you can replicate rows as many times as you have hours adn adjust the 24-03 by adding an extra row for that.

> table(unlist(df[c(rep(c(1:5,7:8), each=3), rep(8,4)), 2:8], use.names=FALSE))

 a  b  c  d  e  f 
30 32 18 38 31 26 

The use.names=FALSE is just something I usually do with unlist. You don't need it, but usually your code is much faster if vectors do not have to carry their names along with them.

Thomas Mailund
  • 1,674
  • 10
  • 16
0

I think gather from the tidyr package will shape the data into a form you want:

> df1 = df %>% tidyr::gather(key = "weekday", value = "name", -Shift)
   Shift   weekday name
1  09-12    Monday    a
2  12-15    Monday    b
3  15-18    Monday    c
4  18-21    Monday    d
...

Then you can add on a hours column using mutate and ifelse:

df2 = df1 %>% mutate(hours = ifelse(Shift == "24-03", 4, 3))

And the answer is a simple group_by / summarise combo:

answer = df2 %>% group_by(name) %>% summarise(hours = sum(hours))

The answer dataframe will look like this:

name hours
-----------
a    34
b    36
c    18
d    30
e    31
f    26

This isn't quite what you wanted, but I think there's something fishy about your sample data. How can someone have 31.5 hours?

lebelinoz
  • 4,890
  • 10
  • 33
  • 56
  • @levelinoz I was able to get the result you got for answer a couple days ago, but I am getting just the sum of hours column without name now... Has there been any changes in the tidyr package? – user9532692 Mar 26 '18 at 10:20
  • @user9532692 I don't think so. This might require you to ask a new question – lebelinoz Mar 26 '18 at 21:51