I have a dictionary which contains a number of unique string values for a key "sample". I'm converting this key "sample" into a list for plotting, however I want to generate another list with an equal number of elements that strip certain strings at the end of each element to generate a "clean" list that can then group certain samples together for plotting. For example, my blacklist looks like:
blacklist = ['_001', '_002', '_003', '_004', '_005', '_006', '_007', '_008', '_009', \
'_01', '_02', '_03', '_04', '_05', '_06', '_07', '_08', '_09', \
'_1', '_2', '_3', '_4', '_5', '_6', '_7', '_8', '_9']
which I want to remove from each item in this example list generated from my dictionary:
sample = [(d['sample']) for d in my_stats]
sample
['sample_A', 'sample_A_001', 'sample_A_002', 'my_long_sample_B_1', 'other_sample_C_08', 'sample_A_03', 'sample1_D_07']
with the desired result of a new list:
sample
['sample_A', 'sample_A', 'sample_A', 'my_long_sample_B', 'other_sample_C', 'sample_A', 'sample1_D']
For context, I understand there will be some elements that will then be the same -- I want to use this list to compile a dataframe in conjunction with lists with an equal number of values generated other keys from this dictionary that will be used as an id in plotting (i.e. such that I can use it to group/color all of those values the same). Note that there may be various numbers of underscores and there may be elements in my list of strings that do not contain any values from the blacklist (which is why I can't use some variant of split on the last underscore for example).
This is similar to this issue: How can I remove multiple characters in a list?
but I don't want it to be so generalized/greedy and would ideally like to remove it from only the end as the user may have an input file with parts of these strings (e.g. the 1 in sample1_D) internally. I don't necessarily need to use a blacklist if there's another solution, it just seemed like that might be the easiest way.