I am scraping a Portuguese website in Python 2.7, and I want to separate Latin words and numbers which are between parentheses. Each text looks like:
text = 'Obras de revisão e recuperação (45453000-7)'
I tried the following code:
#-*- coding: utf-8 -*-
import re
text = u'Obras de revisão e recuperação (45453000-7)'
re.sub(r'\([0-9-]+\)', u'', text).encode("utf8")
the output is:
'Obras de revis\xc3\xa3o e recupera\xc3\xa7\xc3\xa3o '
I want to remove parentheses as well and get an output like:
name = 'Obras de revisão e recuperação'
code = '45453000-7'