A newbie to textmining analysis and R coding.
I have 200 genes with mixed string. I want to split them and paste strings (eg, cadherins, orphan receptors) in one column and numbers (eg, 2/3), number+string (eg, 7D, 7TM) in another column. I used strssplit to split the words. Please any suggestion on how to parse them would be helpful.
example:
> Genes <- c("7D cadherins", "7TM orphan receptors", "7TM orphan receptors RNA18S", "28S ribosomal RNAs RNA28S", "45S pre-ribosomal RNAs RNA45S", "5.8S ribosomal RNAs", "Actin related protein 2/3 complex”)
Expected result(2nd and 3rd column):
7D cadherins cadherins 7D
7TM orphan receptors orphan receptors 7TM
18S ribosomal RNAs RNA18S ribosomal RNAs RNA18S 18S RNA18S
28S ribosomal RNAs RNA28S ribosomal RNAs RNA28S 28S RNA28S
45S pre-ribosomal RNAs RNA45S pre-ribosomal RNAs 45S RNA45S
5.8S ribosomal RNAs ribosomal RNAs 5.8S
Actin related protein 2/3 complex Actin related protein complex 2/3