2

I have a data frame like this:

TargetID        Gene
cg26365299        HOXA9
cg26476852        HOXA9
cg26492446      BHLHE23
cg26521404        HOXA9
cg26531174         CDX1
cg26595643         VAX1

And I want it into this shape

Gene         TargetID
HOXA9        cg26365299;cg26476852;cg26521404
BHLHE23      cg26492446
CDX1         cg26531174
VAX1         cg26595643

I tried with dcast but it doesn't work

user976991
  • 411
  • 1
  • 6
  • 17

3 Answers3

4

Use aggregate. Consider df is your data.frame:

> aggregate(TargetID~Gene, data=df, paste0, collapse=";")
     Gene                         TargetID
1 BHLHE23                       cg26492446
2    CDX1                       cg26531174
3   HOXA9 cg26365299;cg26476852;cg26521404
4    VAX1                       cg26595643
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
1

Another possibility.

ll <- lapply(unstack(df), paste0, collapse = ";")
data.frame(TargetID = names(ll), Gene = unlist(ll), row.names = NULL)

#   TargetID                             Gene
# 1  BHLHE23                       cg26492446
# 2     CDX1                       cg26531174
# 3    HOXA9 cg26365299;cg26476852;cg26521404
# 4     VAX1                       cg26595643
Henrik
  • 65,555
  • 14
  • 143
  • 159
0

Another option using plyr:

ddply(df,.(Gene),summarise,TargetID=paste(TargetID,collapse=";"))
  Gene                         TargetID
1 BHLHE23                       cg26492446
2    CDX1                       cg26531174
3   HOXA9 cg26365299;cg26476852;cg26521404
4    VAX1                       cg26595643
agstudy
  • 119,832
  • 17
  • 199
  • 261