0

I want to check if some words are in a text file, but I need to make the search case-insensitive, so, I need to know how the "in" inside the if condition works, and see its documentation for an option like that.

But I couldn't find it by searching google, I tried to search using terms like "conditional statements python" but still couldn't find it.

#!/usr/bin/python3

search_words = ['Day 3','day 3']

with open('test-target.txt','r') as targetFile:
    for search_word in search_words:

        if search_word in targetFile.read():
            print('yes')
        else:
            print('no')

        # put the read cursor again at the begining of the file to prepare it fot next read ^o^
        targetFile.seek(0)

the file:

Day 3 Lab ......etc
bla bla bla

the output:

yes
no
Omar
  • 607
  • 3
  • 8
  • 18
  • 2
    That's the same as [`operator.__contains__`](https://docs.python.org/3/library/operator.html#operator.contains) – ForceBru Feb 16 '19 at 09:22

2 Answers2

1

You can use casefold() for case-insensitive search. You don't need to use seek(0) as a file pointer, by default, points to beginning of file when you open it. If you are bothered about exhausting the file pointer, read file contents to a variable, and use the variable in loop:

with open('test-target.txt','r') as targetFile:
    file_contents = targetFile.read()
    for search_word in search_words:
        if search_word.casefold() in file_contents:
            print('yes')
        else:
            print('no')
Austin
  • 25,759
  • 4
  • 25
  • 48
  • thanks, I will also use .lower() to make the file also lower case,,,,,,,,, if search_word.casefold() in targetFile.read().lower() – Omar Feb 16 '19 at 09:29
  • the file has "Day3" with an upper case "D", so when I used only .casefold() as you said, it gave me "no no" as it searched for "day" and "day" in a file that only had "Day", – Omar Feb 16 '19 at 09:33
  • Mixing `casefold()` and `lower()` is a bug, isn't it? It will do the wrong thing in the few cases where it actually matters whether you use one or the other. – tripleee Feb 16 '19 at 09:39
  • @tripleee, `casefold()` alone should work in scenarios like this. I don't know why OP is getting a 'no'. – Austin Feb 16 '19 at 09:41
  • @Omar, Please make sure that you have a space between, like 'Day 3'. From you comment, I read there is no space. In such cases this returns a 'no'. – Austin Feb 16 '19 at 09:42
  • I'm not talking about a concrete problem, I am predicting that this code will fail in some scenarios which the OP may or may not have. See e.g. https://stackoverflow.com/questions/45745661/python-lower-vs-casefold-in-string-matching-and-converting-to-lowercase – tripleee Feb 16 '19 at 09:44
  • 1
    @Austin, yes I am sure, anyway, I think a final easy way for this is to use .lower () on both of them: if search_word.lower()in targetFile.read().lower() – Omar Feb 16 '19 at 09:47
  • @Austin, regarding the .seek(0) I need it for the second iteration, to reset the cursor – Omar Feb 16 '19 at 09:53
0

This is called the "contains" operator, a membership test operator. It doesn't really come with options; it simply checks if something is present in something else - but you can "normalize" these "somethings" e.g. by converting both to lower case (or upper case, or Unicode normalized case folded or whatever is suitable for your particular application) before checking for containment.

Seeking back in the file repeatedly is extremely inefficient, though. You want to read the file into memory once:

# Normalize once, before looping
search_words = set([x.lower() for x in ['Day 3','day 3']])

with open('test-target.txt','r') as targetFile:
    contents = targetFile.read()
for search_word in search_words:
    if search_word in contents.lower():
        print('yes')
    else:
        print('no')

... or perhaps examine a line at a time:

with open('test-target.txt','r') as targetFile:
    for line in targetFile:
        for search_word in search_words:
            if search_word in line.lower():
                print('yes')
                break # maybe?
    else:
        print('no')

This will be more robust because you can handle arbitrarily large files, as long as every individual line will fit into memory.

Notice how a for loop can have an else branch, by the by.

As a usability improvement, the message you print should probably identify which search word was or wasn't found in each iteration.

tripleee
  • 175,061
  • 34
  • 275
  • 318