1

I'm currently using the find function and found a slight problem.

theres gonna be a fire here

If I have a sentence with the word "here" and "theres" and I use find() to find "here"s index, I instead get "theres"

I thought find() would be like if thisword in thatword:

as it would find the word, not a substring within a string.

Is there another function that may work similarly? I'm using find() quite heavily would like to know of alternatives before I clog the code with string.split() then iterate until I find the exact match with an index counter on the side.

MainLine = str('theres gonna be a fire here')
WordtoFind = str('here')
#String_Len =  MainLine.find(WordtoFind)
split_line = MainLine.split()

indexCounter = 0
for i in range (0,len(split_line)):
     indexCounter += (len(split_line[i]) + 1)
     if WordtoFind in split_line[i]:
          #String_Len =  MainLine.find(split_line[i])
          String_Len = indexCounter 
          break
  • 2
    why not look for " here" with the leading space? – Daniel Nov 30 '18 at 21:59
  • 4
    `str.find` finds substrings, not *words*. It has no notion of *words*. What you described about using `split` then iterating is the first step beyond using find, that is, tokenization and search. The way to avoid clogging code is to re-use code with functions, classes, etc – juanpa.arrivillaga Nov 30 '18 at 22:00
  • 2
    @DanielJimenez not if "here" could be the fist word, better to split or use a regex – Chris_Rands Nov 30 '18 at 22:02

1 Answers1

2

The best route would be regular expressions. To find a "word" just make sure that the leading and ending characters are not alphanumeric. It uses no splits, has no exposed loops, and even works when you run into a weird sentence like "There is a fire,here". A find_word function might look like this

import re
def find_word_start(word, string):
    pattern = "(?<![a-zA-Z0-9])"+word+"(?![a-zA-Z0-9])"
    result = re.search(pattern, string)
    return result.start()
>> find_word_start("here", "There is a fire,here")
>> 16

The regex I made uses a trick called lookarounds that make sure that the characters preceding and after the word are not letters or numbers. https://www.regular-expressions.info/lookaround.html. The term [a-zA-Z0-9] is a character set that is comprised of a single character in the sets a-z, A-Z, and 0-9. Look up the python re module to find out more about regular expressions.

sakurashinken
  • 3,940
  • 8
  • 34
  • 67
  • Question, if the string in question was "There is a fire here, and here as well!", I notice it sends me the first location. Any way to get both locations? – Haroon Piracha Dec 04 '18 at 01:07
  • 1
    https://stackoverflow.com/questions/4664850/find-all-occurrences-of-a-substring-in-python try the finditer function and then iterate over the result, and call m.start() on each. – sakurashinken Dec 04 '18 at 01:18