1

I have the following data frame.

       employee task
t1        e4     t1
t2        e3     t2
t3        e5     t3
t4        e6     t4
t5        e5     t5
t6        e3     t6
t7        e6     t7
t8        e3     t8
t9        e1     t9
t10       e1    t10
t11       e3    t11
t12       e5    t12
t13       e1    t13
t14       e3    t14
t15       e6    t15
t16       e1    t16
t17       e4    t17
t18       e2    t18
t19       e3    t19
t20       e2    t20
t21       e4    t21
t22       e3    t22
t23       e6    t23
t24       e1    t24
t25       e6    t25
t26       e1    t26
t27       e4    t27
t28       e6    t28
t29       e3    t29
t30       e1    t30
t31       e2    t31
t32       e1    t32
t33       e4    t33
t34       e2    t34
t35       e3    t35
t36       e2    t36
t37       e6    t37
t38       e6    t38
t39       e1    t39
t40       e3    t40

I need to convert it in order to get a dataframe where there is a column for each employee filled with their respective tasks.

I tried to do that using

reshape(dfts, idvar = "employee",
    timevar = "task", direction = "wide")

but doesn´t work

Any idea?

Thanks

Martin
  • 41
  • 7
  • @markus Do you think this is different as it involves reshaping/spreading with **only** two columns? (Asking as I see a difference there, but it may just be because there's something I'm missing) – duckmayr Sep 09 '18 at 22:35
  • In that case - https://stackoverflow.com/questions/11322801/transpose-reshape-dataframe-without-timevar-from-long-to-wide-format – thelatemail Sep 09 '18 at 23:51
  • @thelatemail perfect – duckmayr Sep 10 '18 at 00:10

1 Answers1

0

Hey I had a crack to make a tidy solution. However, to get what you want you're going to end up with making a data frame with many NA's because each employee has a different number of tasks. Also, if you use spread directly you will end up with an error as follows: Error: Duplicate identifiers for rows.

To avoid this, you mutate an id column in and then you deselect it out after using spread.

I also wrote your DF into a csv to read it in.

library(tidyverse)

df <- read_csv("~/Desktop/dummy_data.csv") %>% 
  mutate(id = 1:n()) %>% 
  spread(employee, task) %>% 
  select(-id)

# A tibble: 40 x 6
   e1    e2    e3    e4    e5    e6   
   <chr> <chr> <chr> <chr> <chr> <chr>
 1 NA    NA    NA    t1    NA    NA   
 2 NA    NA    t2    NA    NA    NA   
 3 NA    NA    NA    NA    t3    NA   
 4 NA    NA    NA    NA    NA    t4   
 5 NA    NA    NA    NA    t5    NA   
 6 NA    NA    t6    NA    NA    NA   
 7 NA    NA    NA    NA    NA    t7   
 8 NA    NA    t8    NA    NA    NA   
 9 t9    NA    NA    NA    NA    NA   
10 t10   NA    NA    NA    NA    NA   
# ... with 30 more rows
Vivek Katial
  • 543
  • 4
  • 17
  • Thanks a lot! that crack helps a lot. Do you know any way to sort all the columns in order to put all these NA´s after the last task?. Thats all i need now if i use your crack. – Martin Sep 09 '18 at 23:19
  • Yep, you can use `lapply` to do this. However, you will infact be changing the structure of the dataframe itself. (i.e. The rows will now be different) here is the code to do it: `lapply(df, function(x) sort(x, na.last = TRUE)) %>% as_data_frame()` – Vivek Katial Sep 09 '18 at 23:57