1

I have data in this format:

   boss employee1 employee2
1   wil     james      andy
2 james      dean      bert
3 billy      herb    collin
4  tony      mike     david

and I would like it in this format:

   boss employee
1   wil    james
2   wil     andy
3 james     dean
4 james     bert
5 billy     herb
6 billy   collin
7  tony     mike
8  tony    david

I have searched the forums, but I have not yet found anything that helps. I have tried using dplyr and some others, but I am still pretty new to R.

If this question has been answered and you could give me a link that would be greatly appreciated.

Thanks,

Wil

Tucktuckgoose
  • 73
  • 1
  • 7
  • Hey SO moderators, can we get a perma-duplicate-flag for wide to long reshaping? This comes up **all the time** – alexwhitworth Jul 13 '17 at 23:03
  • 1
    I will say that after reviewing the link that you posted, the answer provided by Scarabee and Nick Criswell are much clearer than the linked answers. As a relatively new person to R, the linked post is dense and has a lot of code, while the answers provided here are much clearer and they are also concise. – Tucktuckgoose Jul 13 '17 at 23:16
  • Two base R alternatives: using `reshape` along the lines of tyler-rink's answer in the linked post: `reshape(df, direction="long", idvar="boss", varying=2:3, sep="")[-2]`. Then brute force style with `stack` and `cbind` is `cbind(boss=df[1], employees=stack(df[-1])$values)`. This returns a warning, but works out OK. – lmo Jul 14 '17 at 00:56
  • @Tucktuckgoose I do understand and appreciate your comment; and I did not find the duplicate I had hoped to--which is more clearly explained. **But:** one of the policies of usage of this community is to first search for your answer and not simply post a duplicate question. New questions should add something to knowledge base not simply complicate the process of finding good answers... **That is, the ideal state is a single, easily findable, high quality answer. It is not numerous lower quality, hard to find, answers.** Good luck as you continue learning R. – alexwhitworth Jul 15 '17 at 15:49

2 Answers2

1

Here is a solution that uses tidyr. Specifically, the gather function is used to combine the two employee columns. This also generates a column bsaed on the column headers (employee1 and employee2) which is called key. We remove that with select from dplyr.

library(tidyr)
library(dplyr)

df <- read.table(
      text = "boss employee1 employee2
      1   wil     james      andy
      2 james      dean      bert
      3 billy      herb    collin
      4  tony      mike     david",
      header = TRUE,
      stringsAsFactors = FALSE
    )


    df2 <- df %>%
      gather(key, employee, -boss) %>%
      select(-key)

> df2
   boss employee
1   wil    james
2 james     dean
3 billy     herb
4  tony     mike
5   wil     andy
6 james     bert
7 billy   collin
8  tony    david

I would be shocked if there isn't a slicker, base solution but this should work for you.

Nick Criswell
  • 1,733
  • 2
  • 16
  • 32
1

Using base R:

df1 <- df[, 1:2]
df2 <- df[, c(1, 3)]
names(df1)[2] <- names(df2)[2] <- "employee"
rbind(df1, df2)
#     boss employee
# 1    wil    james
# 2  james     dean
# 3  billy     herb
# 4   tony     mike
# 11   wil     andy
# 21 james     bert
# 31 billy   collin
# 41  tony    david

Using dplyr:

df %>% 
  select(boss, employee1) %>% 
  rename(employee = employee1) %>% 
  bind_rows(df %>% 
              select(boss, employee2) %>% 
              rename(employee = employee2))
#    boss employee
# 1   wil    james
# 2 james     dean
# 3 billy     herb
# 4  tony     mike
# 5   wil     andy
# 6 james     bert
# 7 billy   collin
# 8  tony    david

Data:

df <- read.table(text = "
   boss employee1 employee2
1   wil     james      andy
2 james      dean      bert
3 billy      herb    collin
4  tony      mike     david                 
                 ", header = TRUE, stringsAsFactors = FALSE)
Scarabee
  • 5,437
  • 5
  • 29
  • 55