I'm trying to parse a sentence like Base: Lote Numero 1, Marcelo T de Alvear 500. Demanda: otras palabras.
I want to: first, split the text by periods, then, use whatever is before the colon as a label
for the sentence after the colon.
Right now I have the following definition:
from pyparsing import *
unicode_printables = u''.join(unichr(c) for c in xrange(65536)
if not unichr(c).isspace())
def parse_test(text):
label = Word(alphas)+Suppress(':')
value = OneOrMore(Word(unicode_printables)|Literal(','))
group = Group(label.setResultsName('label')+value.setResultsName('value'))
exp = delimitedList(
group,
delim='.'
)
return exp.parseString(text)
And kind of works but it drops the unicode caracters (and whatever that is not in alphanums) and I'm thinking that I would like to have the value
as a whole sentence and not this: 'value': [(([u'Lote', u'Numero', u'1', ',', u'Marcelo', u'T', u'de', u'Alvear', u'500'], {}), 1)
.
Is a simple way to tackle this?