0

below is a command to get output from input:

input is all.fly.write:

$GENE           $HUMAN_ORTHOLOG
14-3-3epsilon   YWHAE
14-3-3epsilon   YWHAQ
14-3-3epsilon   YWHAH  
140up           TIMMDC1
26-29-p         CTSF
26-29-p         CTSL

to get output:

$GENE           $HUMAN_ORTHOLOG
14-3-3epsilon   YWHAE,YWHAQ,YWHAH  
140up           TIMMDC1
26-29-p         CTSF,CTSL
    

Below is the command:

output <- ddply(all.fly.write, .(GENE), summarize, matching.Human.Symbol = toString(HUMAN_ORTHOLOG))

I searched the ddply document but is quite confused: https://www.rdocumentation.org/packages/plyr/versions/1.8.6/topics/ddply

The .(GENE) is to group the data using GENE column.

For summarize, I cannot find an argument named summarize, then how is it used here?

For matching.Human.Symbol, I cannot find another location of this name from the context of this code. It only appeared here. What is the role of this argument?

Thanks.

mendel
  • 1
  • 8

1 Answers1

3

Try to use dplyr:

read_table("$GENE           $HUMAN_ORTHOLOG
 14-3-3epsilon   YWHAE
 14-3-3epsilon   YWHAQ
 14-3-3epsilon   YWHAH  
 140up           TIMMDC1
 26-29-p         CTSF
 26-29-p         CTSL") %>% 
   group_by(`$GENE`) %>% 
   summarise(`$HUMAN_ORTHOLOG` = glue::glue_collapse(`$HUMAN_ORTHOLOG`,", ") %>% as.character)
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 2
  `$GENE`       `$HUMAN_ORTHOLOG`  
  <chr>         <chr>              
1 14-3-3epsilon YWHAE, YWHAQ, YWHAH
2 140up         TIMMDC1            
3 26-29-p       CTSF, CTSL  
jyjek
  • 2,627
  • 11
  • 23