0

I have the following data frame:

df <- data.frame(id = c("1","1","1","1","2","2","4","4"), 
             shift = c("1","1","1","2","1","2","1","1"), 
             measurement = c("A","B","C","D","A","B","A","C"), 
             critical_value = c("130","yes","55","1","115","no","110","54")) 

  id shift measurement critical_value
1  1     1           A            130
2  1     1           B            yes
3  1     1           C             55
4  1     2           D              1
5  2     1           A            115
6  2     2           B             no
7  4     1           A            110
8  4     1           C             54

How can transform this data so that every row in which it has the same id and shift to merge by different measurement with critical_value. So that it would look like this:

  id shift   A    B    C    D
1  1     1   130  yes  55   NA
4  1     2   NA   NA   NA   1
5  2     1   3    NA   NA   NA
6  2     2   NA   7    NA   NA
7  4     1   2    NA   5    NA

I tried using ifelse with group_by and summarise_all from the dplyr package. But this resulted in the following faulty output:

  id shift  A  B  C  D
1  1     1  4  8  6  NA
4  1     2 NA NA NA  1
5  2     1  3 NA NA  NA
6  2     2 NA  7 NA  NA
7  4     1  2 NA  5  NA

My code is:

library(dplyr)

df <- data.frame(id = c("1","1","1","1","2","2","4","4"), 
                 shift = c("1","1","1","2","1","2","1","1"), 
                 measurement = c("A","B","C","D","A","B","A","C"), 
                 critical_value = c("130","yes","55","1","115","no","110","54"))

df$A <- ifelse(df$measurement == "A", df$critical_value, NA)
df$B <- ifelse(df$measurement == "B", df$critical_value, NA)
df$C <- ifelse(df$measurement == "C", df$critical_value, NA)
df$D <- ifelse(df$measurement == "D", df$critical_value, NA)

my_fun <- function(x) x[!is.na(x)]

df %>%
  group_by("id","shift") %>%
  summarise_all(funs(my_fun))

df$critical_value <- NULL
df$measurement <- NULL

Notes: - Output doesn't have to be NA - Rownames don't have to be kept - Complete dataset is 7million lines so if you have any advice on how to iterate this fast, please!

  • I'm not sure I understand how you're combining some of the values to get that output. For example, how do you get a value of 7 from "no" combined at Id 2 shift 2. – jasbner Apr 20 '18 at 12:42
  • @jasbner Yeah I don't know either. I expected that the ifelse statement would return the value in df$critical_value. Instead it returned the iteration. – Hussain Rahiminejad Apr 20 '18 at 12:51

1 Answers1

1

Check out spread from tidyr package. Although I'm not sure my answer matches what you're intending to get. Spread takes a long format table and makes it wide.

library(tidyr)
spread(df, measurement, critical_value)

#   id shift    A    B    C    D
# 1  1     1  130  yes   55 <NA>
# 2  1     2 <NA> <NA> <NA>    1
# 3  2     1  115 <NA> <NA> <NA>
# 4  2     2 <NA>   no <NA> <NA>
# 5  4     1  110 <NA>   54 <NA>
jasbner
  • 2,253
  • 12
  • 24