I have one problem that I can't solve it by myself. Here http://www.filedropper.com/data_31 you can download my data. It's small txt file with information about Pathway, Seqs in Pathway, Enzyme, Enzyme ID, Seqs of Enzyme, Seqs Pathway ID.
I would like to reshape/reorganize my data so it would look like something like this:
NODE_1114.... map00592 alpha-Linolenic acid metabolism map01040 Biosynthesis of unsaturated fatty acids
NODE_11280... map00592 alpha-Linolenic acid metabolism NA NA
NODE_1307.... NA NA map01040 Biosynthesis of unsaturated fatty acids
The problem is I don't know how to reorganize my data from this
NODE_12982_length_530_cov_49.8358_ID_25963,NODE_24530_length_385_cov_7.38485_ID_49059,NODE_44451_length_263_cov_34.6298_ID_88901,NODE_19986_length_437_cov_5.82461_ID_39971,
NODE_28195_length_354_cov_77.194_ID_56389
to this
NODE_12982_length_530_cov_49.8358_ID_25963
NODE_24530_length_385_cov_7.38485_ID_49059
NODE_44451_length_263_cov_34.6298_ID_88901
NODE_19986_length_437_cov_5.82461_ID_39971
NODE_28195_length_354_cov_77.194_ID_56389
and how to add additional information about Pathway and Pathway ID to each Seqs (NODE...).
Thank you for your help!
EDITED
Thanks Imo & nilsole for your answers but you missing the point.
Here is my code for data:
Pathway<-rep(c("alpha-Linolenic acid metabolism","Biosynthesis of unsaturated fatty acids"), each=5)
Seq<-c("NODE_12982_length_530_cov_49.8358_ID_25963, NODE_24530_length_385_cov_7.38485_ID_49059, NODE_44451_length_263_cov_34.6298_ID_88901, NODE_19986_length_437_cov_5.82461_ID_39971, NODE_28195_length_354_cov_77.194_ID_56389","NODE_8410_length_627_cov_229.406_ID_16819, NODE_3911_length_812_cov_32.037_ID_7821, NODE_13098_length_528_cov_13.4376_ID_26195, NODE_956_length_1151_cov_11.6797_ID_1911, NODE_4501_length_777_cov_61.2355_ID_9001, NODE_60851_length_208_cov_61.9935_ID_121701, NODE_50593_length_239_cov_608.397_ID_101185, NODE_29294_length_345_cov_1.22069_ID_58587, NODE_57887_length_216_cov_22.6087_ID_115773, NODE_14782_length_501_cov_3.03139_ID_29563, NODE_18662_length_451_cov_798.495_ID_37323, NODE_26461_length_368_cov_3.02556_ID_52921, NODE_56026_length_221_cov_2.91566_ID_112051, NODE_12405_length_540_cov_270.652_ID_24809, NODE_2990_length_874_cov_45.3675_ID_5979, NODE_4753_length_763_cov_7.11864_ID_9505, NODE_17275_length_467_cov_4.0267_ID_34549, NODE_21751_length_416_cov_41.4155_ID_43501, NODE_53355_length_230_cov_19.48_ID_106709, NODE_49191_length_244_cov_1.51852_ID_98381"
,"NODE_61001_length_208_cov_76.3987_ID_122001, NODE_14350_length_507_cov_66.9845_ID_28699, NODE_16148_length_482_cov_189.293_ID_32295, NODE_42206_length_273_cov_135.404_ID_84411, NODE_11280_length_561_cov_335.174_ID_22559, NODE_21858_length_415_cov_31.0306_ID_43715, NODE_824_length_1186_cov_6.48364_ID_1647, NODE_41473_length_276_cov_2.73303_ID_82945, NODE_46025_length_257_cov_166.455_ID_92049",
"NODE_32320_length_325_cov_56.6037_ID_64639, NODE_38741_length_289_cov_27.1795_ID_77481, NODE_9047_length_611_cov_12.6511_ID_18093, NODE_1114_length_1113_cov_24.6059_ID_2227, NODE_47802_length_250_cov_40.8513_ID_95603, NODE_60092_length_210_cov_142.471_ID_120183, NODE_28312_length_353_cov_8.38926_ID_56623",
"NODE_4925_length_754_cov_2.56509_ID_9849, NODE_16010_length_484_cov_322.536_ID_32019, NODE_51261_length_237_cov_33.9011_ID_102521, NODE_19986_length_437_cov_5.82461_ID_39971, NODE_1384_length_1058_cov_1.86939_ID_2767",
"NODE_12982_length_530_cov_49.8358_ID_25963, NODE_24530_length_385_cov_7.38485_ID_49059, NODE_44451_length_263_cov_34.6298_ID_88901, NODE_19986_length_437_cov_5.82461_ID_39971, NODE_28195_length_354_cov_77.194_ID_56389",
"NODE_32320_length_325_cov_56.6037_ID_64639, NODE_38741_length_289_cov_27.1795_ID_77481, NODE_9047_length_611_cov_12.6511_ID_18093, NODE_1114_length_1113_cov_24.6059_ID_2227, NODE_47802_length_250_cov_40.8513_ID_95603, NODE_60092_length_210_cov_142.471_ID_120183, NODE_28312_length_353_cov_8.38926_ID_56623",
"NODE_1114_length_1113_cov_24.6059_ID_2227, NODE_28195_length_354_cov_77.194_ID_56389",
"NODE_1307_length_1072_cov_19.1504_ID_2613, NODE_3418_length_843_cov_15.3959_ID_6835","NODE_4925_length_754_cov_2.56509_ID_9849, NODE_16010_length_484_cov_322.536_ID_32019, NODE_51261_length_237_cov_33.9011_ID_102521, NODE_19986_length_437_cov_5.82461_ID_39971, NODE_1384_length_1058_cov_1.86939_ID_2767")
Pathway_ID<-rep(c("map00592","map01040"),each=5)
df<-data.frame(Pathway,Seq,Pathway_ID)
Data looks like this:
Pathway Seq Pathway_ID
aplha-Linolenic acid metabolism NODE_12982...,NODE_8410.. map00592
aplha-Linolenic acid metabolism NODE....,NODE... map00592
aplha-Linolenic acid metabolism NODE....,NODE... map00592
aplha-Linolenic acid metabolism NODE....,NODE... map00592
aplha-Linolenic acid metabolism NODE....,NODE... map00592
Biosynthesis of unsaturated fatty acids NODE....,NODE... map01040
Biosynthesis of unsaturated fatty acids NODE....,NODE... map01040
Biosynthesis of unsaturated fatty acids NODE....,NODE... map01040
Biosynthesis of unsaturated fatty acids NODE....,NODE... map01040
Biosynthesis of unsaturated fatty acids NODE....,NODE... map01040
I would like to look like this:
NODE_1114.... map00592 alpha-Linolenic acid metabolism map01040 Biosynthesis of unsaturated fatty acids
NODE_11280... map00592 alpha-Linolenic acid metabolism NA NA
NODE_1307.... NA NA map01040 Biosynthesis of unsaturated fatty acids
With strsplit
function I lost information to which NODE...
each Pathway
and Pathway_ID
belongs to. In Seq column are different numbers of NODEs
, also some NODEs
belong to both pathways in such a case I want both pathways to be assign to specific NODE
; like this
NODE_1114.... map00592 alpha-Linolenic acid metabolism map01040 Biosynthesis of unsaturated fatty acids
I hope you can help me out! Thank you!