>>> randomList = ["ACGT","A#$..G","..,/\]AGC]]]T"]
>>> import re
>>> [re.sub("[^ACGT]+", "", s) for s in randomList]
['ACGT', 'AG', 'AGCT']
[^ACGT]+
matches one or more (+
) characters except ACGT
.
Some timings:
>>> import timeit
>>> setup = '''randomList = ["ACGT","A#$..G","..,/\]AGC]]]T"]
... import re'''
>>> timeit.timeit(setup=setup, stmt='[re.sub("[^ACGT]+", "", s) for s in randomList]')
8.197133132976195
>>> timeit.timeit(setup=setup, stmt='[re.sub("[^ACGT]", "", s) for s in randomList]')
9.395620040786165
Without re
, it's faster (see @cmd's answer):
>>> timeit.timeit(setup=setup, stmt="[''.join(c for c in s if c in 'ACGT') for s in randomList]")
6.874829817476666
Even faster (see @JonClement's comment):
>>> setup='''randomList = ["ACGT","A#$..G","..,/\]AGC]]]T"]\nascii_exclude = ''.join(set('ACGT').symmetric_difference(map(chr, range(256))))'''
>>> timeit.timeit(setup=setup, stmt="""[item.translate(None, ascii_exclude) for item in randomList]""")
2.814761871275735
Also possible:
>>> setup='randomList = ["ACGT","A#$..G","..,/\]AGC]]]T"]'
>>> timeit.timeit(setup=setup, stmt="[filter(set('ACGT').__contains__, item) for item in randomList]")
4.341086316883207