0

I have a data frame in R that I would like to re-organize. Consider the following:

samples=c("167_1", "167_2", "167_3", "167_4", "167_5", "167_6", "167_7", "167_8", "167_9", "167_10", "167_11", "167_12", "167_13", "167_14", "167_15")
condition=c("Group4", "Group7", "Group8", "Group3", "Group4", "Group2", "Group6", "Group1", "Group2", "Group9", "Group7", "Group8", "Group3", "Group5", "Group5")
df=data.frame(samples, condition)

Gives the following:

> head(df)
  samples condition
1   167_1    Group4
2   167_2    Group7
3   167_3    Group8
4   167_4    Group3
5   167_5    Group4
6   167_6    Group2

I would like to re-organize the data as such:

condition  samples     
Group1     167_8
Group2     167_6, 167_9
Group3     167_13, 167_4
Group4     167_1, 167_5
Group5     167_14, 167_15
Group6     167_7
Group7     167_11, 167_2
Group8     167_12, 167_3
Group9     167_10

I've tried using reshape2 and I can get from the long to the wide format but I'm not sure how to progress from the mess of NAs to a summarized list.

library(reshape2)
dcast(df, condition ~ samples)

Any help would be greatly appreciated and thank you.

Adrian Reich
  • 165
  • 7

1 Answers1

2

You can do this with dplyr as follows:

library(dplyr)

df %>%
  group_by(condition) %>%
  summarise(samples = paste(samples, collapse = ", "))

Result:

# A tibble: 9 × 2
  condition        samples
     <fctr>          <chr>
1    Group1          167_8
2    Group2   167_6, 167_9
3    Group3  167_4, 167_13
4    Group4   167_1, 167_5
5    Group5 167_14, 167_15
6    Group6          167_7
7    Group7  167_2, 167_11
8    Group8  167_3, 167_12
9    Group9         167_10
ulfelder
  • 5,305
  • 1
  • 22
  • 40
  • or you can use `nest` : `df_new = df %>% nest(samples)` , which gives you a variable-length tibble for each group – lbusett Jan 31 '17 at 21:54