I have a file with ranked information:
df1
Type Rank
frameshift 1
stop_gained 2
stop_lost 3
splice_region_variant 4
splice_acceptor_variant 5
splice_donor_variant 6
missense_variant 7
coding_sequence_variant 8
intron_variant 9
NMD_transcript_variant 10
non_coding 11
and another file containing values separated with ,
df2['Variants']
A|intron_variant&NMD_transcript_variant|MODIFIER|23||,A|intron_variant&non_coding|MODIFIER|||,A|intron_variant&non_coding|MODIFIER|||
G|missense_variant&splice_region_variant|HIGH|85||,A|intron_variant&non_coding|MODIFIER|||,A|intron_variant&non_coding|MODIFIER|||
G|missense_variant|MODERATE|23||,G|frameshift&intron_variant|HIGH|||,G|intron_variant&non_coding|MODIFIER|||,G|frameshift&missense_variant|HIGH|42||
G|missense_variant|MODERATE|23||,G|intron_variant|MODIFIER|||,G|intron_variant&non_coding|MODIFIER|||,G|stop_gained&splice_region_variant|HIGH|||
G|missense_variant|MODERATE|23||
G|missense_variant&stop_lost|HIGH|12||
I want to extract the data from df2['Variants']
based on the rank order mentioned in the df1
. some complication in the data are sometimes they are given combinedly with &
as frameshift&intron_variant
. In such cases, I want to split the data by &
consider it by their rank. Likewise, I want to extract the values from the data as:
Extracted Ranked
A|intron_variant&NMD_transcript_variant|MODIFIER|23|| intron_variant
G|missense_variant&splice_region_variant|HIGH|85|| splice_region_variant
G|frameshift&intron_variant|HIGH||| frameshift
G|stop_gained&splice_region_variant|HIGH||| stop_gained
G|missense_variant|MODERATE|23|| missense_variant
G|missense_variant&stop_lost|HIGH|12|| stop_lost
I was able to split the files using &
using the code given here. But unable to extract high ranked variants given in df1 from multiple values separated by ,
comma.
Thanks