1

My texteditor (vim) can give the positions of a string in a string but counts the number of bytes, not the number of characters.

Example:

s="I don't take an apéritif après-ski"

When I search the word apéritif my texteditor gives the position:
16,25

Python gives this position of the same word:
16,24

Vim gives the possibility to execute python code in the editor.
In one of my python scripts I do a lot of slicing.
But I never find the correct word if there are accented characters in the string.
Is there a way to resolve this in python?
Can I find the byte position of a string in a string in python?

Sato Katsura
  • 3,066
  • 15
  • 21
Reman
  • 7,931
  • 11
  • 55
  • 97

1 Answers1

1

This is,admittedly, a naive solution. You can encode both the text and word to bytes, and then run find() operation on encoded text with encoded word as parameter.

def f(text,word):
    en_text=bytes(text,encoding="utf-8")
    en_word=bytes(word,encoding="utf-8")
    start = en_text.find(en_word)
    return (start,start+len(en_word))

When run as:

f("I don't take an apéritif après-ski","apéritif")

returns (16, 25)

Shihab Shahriar Khan
  • 4,930
  • 1
  • 18
  • 26