Please I need help. I've got a problem when trying to find accented words in a text (in Spanish). I have to search in a large text the first paragraph starting with the words 'Nombre vernáculo'
For example, the text is like: "Nombre vernáculo registrado en la zona de ..."
But accented words are not recoginzed by my python script.
I've tryed with:
re.compile('/(?<!\p{L})(vern[áa]culo*)(?!\p{L})/')
re.compile(r'Nombre vern[a\xc3\xa1]culo\.', re.UNICODE)
re.compile ('[A-Z][a-záéíóúñ]+')
\p{Lu}] [\p{Ll}]+ \b
I've read the following threads:
grep/regex can't find accented word
Python Regex strange behavior with accented characters
Python regex and accented Expression
Python: using regex and tokens with accented chars (negative lookbehind)
Also I found something that almost work:
In [95]: dd=re.search(r'^\w.*', 'Nombre vernáculo' )
In [96]: dd.group(0)
Out[96]: 'Nombre vern\xc3\xa1culo'
But it also returns all accented words in the text.
Any help with this will be appreciaded. Thanks.