Consider a bed file containing genetic variants:
CHR START STOP RSID REF/ALT PHENOTYPE PVALUE
1 987654321 987654322 rs123456 A/T Height 6E-9
1 987654321 987654322 rs123456 A/T Stroke 8E-15
I want to sort unique by the first 5 columns, and then merge the contents in the columns that have unique values:
Example output:
CHR START STOP RSID REF/ALT PHENOTYPE PVALUE
1 987654321 987654322 rs123456 A/T Height,Stroke 6E-9,8E-15
Is this possible in Python or Unix? Or do I need to write a script?
If it is possible in Python or Unix, what function allows me to do that?
This question was addressed here but never solved.