I have a data frame like this:
Entry name Gene names
A1BG_HUMAN A1BG
M0R009_HUMAN A1BG
F8W9F8_HUMAN A1CF
Q5T0W7_HUMAN A1CF
A1CF_HUMAN A1CF ACF ASP
H0YFH1_HUMAN A2M
A2MG_HUMAN A2M CPAMD5 FWP007
In the 1st column I have protein names and in the 2nd column I have related genes. In front of some of the proteins multiple gene names can be seen which are basically aliases of the first gene in that cell (separated by 1 space).
I want to convert this data set to a form which every protein name is paired with different gene names, so that I have sth like this:
Entry name Gene names
A1BG_HUMAN A1BG
M0R009_HUMAN A1BG
F8W9F8_HUMAN A1CF
F8W9F8_HUMAN ACF
F8W9F8_HUMAN ASP
Q5T0W7_HUMAN A1CF
Q5T0W7_HUMAN ACF
Q5T0W7_HUMAN ASP
A1CF_HUMAN A1CF
A1CF_HUMAN ACF
A1CF_HUMAN ASP
H0YFH1_HUMAN A2M
H0YFH1_HUMAN CPAMD5
H0YFH1_HUMAN FWP007
A2MG_HUMAN A2M
A2MG_HUMAN CPAMD5
A2MG_HUMAN FWP007
I know how to split the cells with multiple entries into different rows but I am not sure how to pair proteins in the first column with different aliases of a gene.
Does any one have any idea on how to do it?
EDIT: I don't want to only split the data into different rows. So cSplit will not actually help me here. Let me bring an example:
In front of A1CF_HUMAN, different aliases of A1CF gene are brought (ACF & ASP). I want to not only pair A1CF_HUMAN with ACF & ASP, but also pair other proteins which are associated with A1CF gene (F8W9F8_HUMAN & Q5T0W7_HUMAN) with ACF & ASP as well. Please take a look at what I want above to better understand what I am exactly looking for. I don't think that it can be done by a single command.