I have a output data, where in each row there are multiple isoforms for each gene. Isoforms are seperated by comma ','. When I import the table to R, data frame looks like as below.
Df:
gene isoform sample1_read_number p-value
A 'A1','A2','A3' 0:23,1:12,2:122 0.9,0.01,0.5
B 'B1','B2','B3' 0:3,1:45,2:76 0.43,0.001,0.12
C 'C1','C2','C3','C4' 0:5,1:56,2:166,3:7 0.004,0.002,0.23,0.12
D 'D1','D2' 0:43,1:100 0.1,0.0003
For each gene, there are multiple isoforms. For each isoform, I have read numbers, seperated by comma (0:23 read for A1 meaning A1 read is 23) and p-values seperated by comma (p-value for A1 is 0.9 and A2 is 0.01). So everything is in an order by comma separation in each object.
For example when I call, df[1,2]
the result is [1] 'A1','A2','A3''
or df[1,4]
the result is [1] 0.9,0.01,0.5
as one object. I couldn't figure how to make R to separate those values in df[X,Y].
The reason I want to do this is because, I want to filter this data to based on p-value or read number. To be able to do that, first I should be able to break this data frame by each isoform and to do that I need to find a way to separate values on each spot.
Final data frame should be like that (only showing for gene A and B here):
Df_I:
gene isoform sample1_read_number p-value
A A1 0:23 0.9
A A2 1:12 0.01
A A3 2:122 0.5
B B1 0:3 0.43
B B2 1:45 0.001
B B3 2:76 0.12
Anybody can give me ideas to make this second data frame? Any help would be appreciated a lot!
Cheers! A