1

I have a string as follows:

f = 'ATCTGTCGTYCACGT'

I want to check whether the string contains any characters except: A, C, G or T, and if so, print them.

for i in f:                                                                                                                        
    if i != 'A' and i != 'C' and i != 'G' and i != 'T':                                                                             
        print(i)

Is there a way to achieve this without looping through the string?

Ch3steR
  • 20,090
  • 4
  • 28
  • 58
Homap
  • 2,142
  • 5
  • 24
  • 34
  • 2
    Do you want to check if the characters are not in the string, or do you want to print all the characters except for some characters? these are 2 different tasks – DeepSpace Jan 21 '20 at 18:37
  • I want to know if there is any character in the f string except 'A', 'C', 'G' or 'T'. If so, I want to print it. – Homap Jan 21 '20 at 18:45
  • Is your main reason for not wanting a loop the time/efficiency? Sets seem like a good place to optimize a lookup, but at the cost of the conversion of a string to a set, which can have a significant processing overhead – G. Anderson Jan 21 '20 at 18:52
  • 1
    Does this answer your question? [Check if a string contains only given characters](https://stackoverflow.com/questions/26703664/check-if-a-string-contains-only-given-characters) – Georgy Jan 21 '20 at 19:07
  • 1
    Do you mean without using a `for` loop, or any looping construct. For example, under the hood, functional programming methods (`any`, `map`), and it could be argued that `regex` does as well. For clarity do you want to avoid any function that employs a looping construct, or avoid the basic looping functions, such as `for`, `while`, etc ? If the former, perhaps `set`, for the latter, I'd probably take a regex (`match`) approach. – SherylHohman Jan 21 '20 at 20:02

4 Answers4

3

Depending on the size of your input string, the for loop might be the most efficient solution.

However, since you explicitly ask for a solution without an explicit loop, this can be done with a regex.

import re

f = 'ABCDEFG'

print(*re.findall('[^ABC]', f), sep='\n')

Outputs

D
E
F
G
DeepSpace
  • 78,697
  • 11
  • 109
  • 154
3

You can use set to achieve the desired output.

f = 'ATCTGTCGTYCACGTXYZ'
not_valid={'A', 'C', 'G' , 'T'}
unique=set(f)
print(unique-not_valid)

output

{'Y','X','Z'} #characters in f which are not equal to 'A','C','G','T'
Ch3steR
  • 20,090
  • 4
  • 28
  • 58
  • Correct if OP does not care about order or duplication (should `f = 'ABCA' ; not_valid = {'A'}` print `A` or `AA`)? – DeepSpace Jan 21 '20 at 18:54
  • @DeepSpace Sorry, I didn't understand your query `(should f = 'ABCA' ; not_valid = {'A'} print A or AA)?` If you don't mind can you explain again? – Ch3steR Jan 21 '20 at 19:02
  • 1
    If there are duplicated missing characters your solution will only print one of them, and also out of order (because sets are unordered). I'm not saying that it is bad, just that OP didn't specify what they want – DeepSpace Jan 21 '20 at 19:09
  • @DeepSpace My first approach was using regex but in one of the comments OP mentioned this *I want to know if there is any character in the f string except 'A', 'C', 'G' or 'T'* So, I thought set is one of the option. – Ch3steR Jan 21 '20 at 19:11
0

Just do

l = ['A', 'C', 'G', 'T']

for i in f:
    if i not in l:
        print(i)

It checks whether the list contains a char of the list


If you don't want to loop through the list you can do:

import re

l = ['A', 'C', 'G', 'T']

contains = bool(re.search("%s" % "[" + "".join(l) + "]", f))
Kumpelinus
  • 640
  • 3
  • 12
  • 3
    `"Is there a way to achieve this without looping through the string?"` This code does exactly what OP's does, just in a neater way (which wasn't the premise of the question) – DeepSpace Jan 21 '20 at 18:41
  • That is right. I have edited it so that it now meets the requirements – Kumpelinus Jan 21 '20 at 18:57
0

Technically this loops but we convert your input string to a set which removes duplicate values

accepted_values = ['a','t','c','g']

input = 'ATCTGTCGTYCACGT'

print([i for i in set(input.lower()) if i not in accepted_values])
badger0053
  • 1,179
  • 1
  • 13
  • 19