Finding position of utf8 substring in string in python

Question

I am trying to find a utf-8 substring in a string.

Here is my code:

str = u'haha i am going home'
substr1 = u'haha'
substr2 = u'ha'

if i run

str.find(substr1) #returns 0 
str.find(substr2) #returns 0

I would like

str.find(substr2) to return -1 instead as there i want to match by word instead.

You don't have UTF-8 strings. You have Unicode strings; the two concepts are related but very much not the same thing. One is encoded bytes, the other is text. — Martijn Pieters, Apr 20 '15 at 07:38
`str` is a Python type, be sure to use another variable name. — 101, Apr 20 '15 at 07:38
FWIW, a simple way to do this without regex is to append a space to both your target string and your search strings. OTOH, unlike the regex approach, that doesn't handle punctuation. — PM 2Ring, Apr 20 '15 at 07:44

score 4 · Accepted Answer · answered Apr 20 '15 at 07:39

Use regex

import re

str = u'haha i am going home'
substr1 = u'haha'
substr2 = u'ha'

match = re.search(r'\b%s\b' % substr1 ,str)

if match:
    print "found substring 1"

match = re.search(r'\b%s\b' % substr2 ,str)

if match:
    print "found substring 2"

Finding position of utf8 substring in string in python

1 Answers1