2

I have a vector with gene names where several elements in the vector contains more than one gene name, separated with a comma. How can I separate the elements of this vector and get a long vector with each gene name as a separate element of the vector? I have tried strsplit but that just give me the two or more gene names as separated strings but still in the same element of the vector... /Frida

genes = c("PGD", "CDA", "MROH7,TTC4", "PGM1") 

and I want to separate the element "MROH7,TTC4" into the two elements "MROH7" and "TTC4"

Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
user3346285
  • 101
  • 1
  • 1
  • 5
  • 3
    Welcome on SO. Could you please provide an [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – sgibb May 21 '14 at 19:42
  • Hi,the vector looks like this: genes = ("PGD", "CDA", "MROH7,TTC4", "PGM1") and I want to separate the element "MROH7,TTC4" into the two elements "MROH7" and "TTC4". – user3346285 May 21 '14 at 19:45

2 Answers2

9

This will split your string at every comma:

genes = c("PGD", "CDA", "MROH7,TTC4", "PGM1")
genes.split = unlist(strsplit(genes, ","))

genes.split
[1] "PGD"   "CDA"   "MROH7" "TTC4"  "PGM1" 
eipi10
  • 91,525
  • 24
  • 209
  • 285
4

Another option is scan, which will also eat white space.

scan(text=genes, what='', sep=',', strip.white=TRUE)
Matthew Plourde
  • 43,932
  • 7
  • 96
  • 113