0

I have the below dataset, I am trying to find the number of days each machine was active and in sleep mode based on SwitchedOnDate.

 MachineID InstalledDate SwitchedOnDate Status
 1           2010-02-18    2010-02-19    SleepMode
 1           2010-02-18    2010-02-20    Active
 1           2010-02-18    2010-02-21    SleepMode
 1           2010-02-18    2010-02-22    Active
 2           2010-02-20    2010-02-21    Active
 2           2010-02-20    2010-02-22    SleepMode
 3           2010-02-10    2010-02-18    SleepMode
 4           2010-03-10    2010-03-15    Active

So my output should look something like this,

MachineID SleepModeDays ActiveDays
1          2             2
2          1             1
3          1             0
4          0             1
ds_user
  • 2,139
  • 4
  • 36
  • 71

2 Answers2

1

You can use count to group and count occurrences of each level of the last variable you pass it, and then spread to rearrange the data to wide form:

library(tidyverse)

df %>% count(MachineID, Status) %>% spread(Status, n, fill = 0L)

## Source: local data frame [4 x 3]
## Groups: MachineID [4]
## 
##   MachineID Active SleepMode
## *     <int>  <int>     <int>
## 1         1      2         2
## 2         2      1         1
## 3         3      0         1
## 4         4      1         0

A base R alternative:

tab <- table(df$MachineID, df$Status)

data.frame(MachineID = rownames(tab), as.data.frame.matrix(tab))

##   MachineID Active SleepMode
## 1         1      2         2
## 2         2      1         1
## 3         3      0         1
## 4         4      1         0

or for a very direct route,

janitor::crosstab(df, MachineID, Status)

##   MachineID Active SleepMode
## 1         1      2         2
## 2         2      1         1
## 3         3      0         1
## 4         4      1         0

Note these approachs makes a few assumptions, e.g. that you can't have multiple occurrences of the same combination of MachineID, SwitchedOnDate, and Status. If your data gets more complicated, account for such.

alistaire
  • 42,459
  • 4
  • 77
  • 117
0

Using data.table for aggregation and dcast function, here is another solution :

library(data.table)
data <-  "MachineID InstalledDate SwitchedOnDate Status
 1           2010-02-18    2010-02-19    SleepMode
 1           2010-02-18    2010-02-20    Active
 1           2010-02-18    2010-02-21    SleepMode
 1           2010-02-18    2010-02-22    Active
 2           2010-02-20    2010-02-21    Active
 2           2010-02-20    2010-02-22    SleepMode
 3           2010-02-10    2010-02-18    SleepMode
 4           2010-03-10    2010-03-15    Active"

 data <-  read.table(textConnection(data), header=TRUE)
 setDT(data)
 dcast(data[, .N, by=.(MachineID, Status)], MachineID ~ Status, fill=0, value.var="N")
Kumar Manglam
  • 2,780
  • 1
  • 19
  • 28
  • The answer mirrors the provided solution. In the code, SwitchedOnDate column does not play any role. Please clarify, if SwitchedOnDate column also needs to be considered. – Kumar Manglam Nov 14 '16 at 05:50
  • 1
    You don't need `.N` as `dcast` has a `fun.aggregate` parameter, which defaults to `length` (see [here](http://stackoverflow.com/questions/33051386/dcast-restructuring-from-long-to-wide-format-not-working/33051521#33051521) for an explanation). – Jaap Nov 14 '16 at 07:16
  • Hey Hi. Yes the idea is to find out "For each machine, how many different days it was in sleepmode and how many different days it was in active mode" – ds_user Nov 14 '16 at 21:06