0

I am trying to use melt amb cast to transform this data frame

 knowngene                                           Meth
 uc003fia.3 cg00000108;0.864484486796394;0.928944704280193
 uc003cha.4 cg00000108;0.864484486796394;0.928944704280193
 uc003fhz.4 cg00000109;0.881060551674426;0.910939682196076
 uc003fhz.4 cg00000132;0.881060551674426;0.910939682196076
 uc003fia.3 cg00000109;0.881060551674426;0.910939682196076
 uc003fia.3 cg00000236;0.799251070221749;0.898656886868738

In something like this

 knowngene                                           Meth
 uc003fia.3 cg00000108;0.864484486796394;0.928944704280193;cg00000109;0.881060551674426;0.910939682196076;cg00000236;0.799251070221749;0.898656886868738
 uc003cha.4 cg00000108;0.864484486796394;0.928944704280193
 uc003fhz.4 cg00000109;0.881060551674426;0.910939682196076;cg00000132;0.881060551674426;0.910939682196076

But for a particular reason I couldn't reshape the data frame, maybe changing to a list first?

user976991
  • 411
  • 1
  • 6
  • 17

3 Answers3

2

Split and apply will get you close:

lapply(split(x$Meth, x$knowngene), paste, collapse="; ")

$uc003cha.4
[1] "cg00000108;0.864484486796394;0.928944704280193"

$uc003fhz.4
[1] "cg00000109;0.881060551674426;0.910939682196076; cg00000132;0.881060551674426;0.910939682196076"

$uc003fia.3
[1] "cg00000108;0.864484486796394;0.928944704280193; cg00000109;0.881060551674426;0.910939682196076; cg00000236;0.799251070221749;0.898656886868738"

The result is a named list with all of the text concatenated in the way you wanted. You can convert it to a data frame using names() and unname():

data.frame(knowngene=names(x), Meth=unlist(unname(x)))

   knowngene
1 uc003cha.4
2 uc003fhz.4
3 uc003fia.3
                                                                                                                                            Meth
1                                                                                                 cg00000108;0.864484486796394;0.928944704280193
2                                                 cg00000109;0.881060551674426;0.910939682196076; cg00000132;0.881060551674426;0.910939682196076
3 cg00000108;0.864484486796394;0.928944704280193; cg00000109;0.881060551674426;0.910939682196076; cg00000236;0.799251070221749;0.898656886868738
Andrie
  • 176,377
  • 47
  • 447
  • 496
1

Try

cast(knowngene ~ ., data = your.data.frame, value = "Meth", 
    function = paste, sep = ";")
Thierry
  • 18,049
  • 5
  • 48
  • 66
1

It sounds like you just need aggregate():

First, your data:

myDF <- read.table(header = TRUE, text = "knowngene   Meth
uc003fia.3 cg00000108;0.864484486796394;0.928944704280193
uc003cha.4 cg00000108;0.864484486796394;0.928944704280193
uc003fhz.4 cg00000109;0.881060551674426;0.910939682196076
uc003fhz.4 cg00000132;0.881060551674426;0.910939682196076
uc003fia.3 cg00000109;0.881060551674426;0.910939682196076
uc003fia.3 cg00000236;0.799251070221749;0.898656886868738")

Second, the aggregation:

aggregate(Meth ~ knowngene, myDF, paste, collapse=";")
#    knowngene                                                                                                                                         Meth
# 1 uc003cha.4                                                                                               cg00000108;0.864484486796394;0.928944704280193
# 2 uc003fhz.4                                                cg00000109;0.881060551674426;0.910939682196076;cg00000132;0.881060551674426;0.910939682196076
# 3 uc003fia.3 cg00000108;0.864484486796394;0.928944704280193;cg00000109;0.881060551674426;0.910939682196076;cg00000236;0.799251070221749;0.898656886868738
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485