0

I am trying to find a utf-8 substring in a string.

Here is my code:

str = u'haha i am going home'
substr1 = u'haha'
substr2 = u'ha'

if i run

str.find(substr1) #returns 0 
str.find(substr2) #returns 0 

I would like

str.find(substr2) to return -1 instead as there i want to match by word instead.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
aceminer
  • 4,089
  • 9
  • 56
  • 104
  • 2
    You don't have UTF-8 strings. You have Unicode strings; the two concepts are related but very much not the same thing. One is encoded bytes, the other is text. – Martijn Pieters Apr 20 '15 at 07:38
  • 1
    Write your own function. – khajvah Apr 20 '15 at 07:38
  • 2
    `str` is a Python type, be sure to use another variable name. – 101 Apr 20 '15 at 07:38
  • 1
    FWIW, a simple way to do this without regex is to append a space to both your target string and your search strings. OTOH, unlike the regex approach, that doesn't handle punctuation. – PM 2Ring Apr 20 '15 at 07:44

1 Answers1

4

Use regex

import re

str = u'haha i am going home'
substr1 = u'haha'
substr2 = u'ha'

match = re.search(r'\b%s\b' % substr1 ,str)

if match:
    print "found substring 1"

match = re.search(r'\b%s\b' % substr2 ,str)

if match:
    print "found substring 2"
lapinkoira
  • 8,320
  • 9
  • 51
  • 94