I'd like to parse tag/value descriptions using the delimiters :, and •
E.g. the Input would be:
Name:Test•Title: Test•Keywords: A,B,C
the expected result should be the name value dict
{
"name": "Test",
"title": "Title",
"keywords: "A,B,C"
}
potentially already splitting the keywords in "A,B,C" to a list. (This is a minor detail since the python built in split method of string will happily do this).
Also applying a mapping
keys={
"Name": "name",
"Title": "title",
"Keywords": "keywords",
}
as a mapping between names and dict keys would be helpful but could be a separate step.
I tried the code below https://trinket.io/python3/8dbbc783c7
# pyparsing named values
# Wolfgang Fahl
# 2023-01-28 for Stackoverflow question
import pyparsing as pp
notes_text="Name:Test•Title: Test•Keywords: A,B,C"
keys={
"Name": "name",
"Titel": "title",
"Keywords": "keywords",
}
keywords=list(keys.keys())
runDelim="•"
name_values_grammar=pp.delimited_list(
pp.oneOf(keywords,as_keyword=True).setResultsName("key",list_all_matches=True)
+":"+pp.Suppress(pp.Optional(pp.White()))
+pp.delimited_list(
pp.OneOrMore(pp.Word(pp.printables+" ", exclude_chars=",:"))
,delim=",")("value")
,delim=runDelim).setResultsName("tag", list_all_matches=True)
results=name_values_grammar.parseString(notes_text)
print(results.dump())
and variations of it but i am not even close to the expected result. Currently the dump shows:
['Name', ':', 'Test']
- key: 'Name'
- tag: [['Name', ':', 'Test']]
[0]:
['Name', ':', 'Test']
- value: ['Test']
Seems i don't know how to define the grammar and work on the parseresult in a way to get the needed dict result.
The main questions for me are:
- Should i use parse actions?
- How is the naming of part results done?
- How is the navigation of the resulting tree done?
- How is it possible to get the list back from delimitedList?
- What does list_all_matches=True achieve - it's behavior seems strange
I searched for answers on the above questions here on stackoverflow and i couldn't find a consistent picture of what to do.
- Pyparsing delimited list only returns first element
- Finding lists of elements within a string using Pyparsing
PyParsing seems to be a great tool but i find it very unintuitive. There are fortunately lots of answers here so i hope to learn how to get this example working
Trying myself i took a stepwise approach:
First i checked the delimitedList behavior see https://trinket.io/python3/25e60884eb
# Try out pyparsing delimitedList
# WF 2023-01-28
from pyparsing import printables, OneOrMore, Word, delimitedList
notes_text="A,B,C"
comma_separated_values=delimitedList(Word(printables+" ", exclude_chars=",:"),delim=",")("clist")
grammar = comma_separated_values
result=grammar.parseString(notes_text)
print(f"result:{result}")
print(f"dump:{result.dump()}")
print(f"asDict:{result.asDict()}")
print(f"asList:{result.asList()}")
which returns
result:['A', 'B', 'C']
dump:['A', 'B', 'C']
- clist: ['A', 'B', 'C']
asDict:{'clist': ['A', 'B', 'C']}
asList:['A', 'B', 'C']
which looks promising and the key success factor seems to be to name this list with "clist" and the default behavior looks fine.
https://trinket.io/python3/bc2517e25a shows in more detail where the problem is.
# Try out pyparsing delimitedList
# see https://stackoverflow.com/q/75266188/1497139
# WF 2023-01-28
from pyparsing import printables, oneOf, OneOrMore,Optional, ParseResults, Suppress,White, Word, delimitedList
def show_result(title:str,result:ParseResults):
"""
show pyparsing result details
Args:
result(ParseResults)
"""
print(f"result for {title}:")
print(f" result:{result}")
print(f" dump:{result.dump()}")
print(f" asDict:{result.asDict()}")
print(f" asList:{result.asList()}")
# asXML is deprecated and doesn't work any more
# print(f"asXML:{result.asXML()}")
notes_text="Name:Test•Title: Test•Keywords: A,B,C"
comma_text="A,B,C"
keys={
"Name": "name",
"Titel": "title",
"Keywords": "keywords",
}
keywords=list(keys.keys())
runDelim="•"
comma_separated_values=delimitedList(Word(printables+" ", exclude_chars=",:"),delim=",")("clist")
cresult=comma_separated_values.parseString(comma_text)
show_result("comma separated values",cresult)
grammar=delimitedList(
oneOf(keywords,as_keyword=True)
+Suppress(":"+Optional(White()))
+comma_separated_values
,delim=runDelim
)("namevalues")
nresult=grammar.parseString(notes_text)
show_result("name value list",nresult)
#ogrammar=OneOrMore(
# oneOf(keywords,as_keyword=True)
# +Suppress(":"+Optional(White()))
# +comma_separated_values
#)
#oresult=grammar.parseString(notes_text)
#show_result("name value list with OneOf",nresult)
output:
result for comma separated values:
result:['A', 'B', 'C']
dump:['A', 'B', 'C']
- clist: ['A', 'B', 'C']
asDict:{'clist': ['A', 'B', 'C']}
asList:['A', 'B', 'C']
result for name value list:
result:['Name', 'Test']
dump:['Name', 'Test']
- clist: ['Test']
- namevalues: ['Name', 'Test']
asDict:{'clist': ['Test'], 'namevalues': ['Name', 'Test']}
asList:['Name', 'Test']
while the first result makes sense for me the second is unintuitive. I'd expected a nested result - a dict with a dict of list.
What causes this unintuitive behavior and how can it be mitigated?