0

I am trying to create a dictionary from a list of strings. My attempt to convert this list of string to list of dictionary is as below:

author_dict = [[dict(map(str.strip, s.split(':')) for s in author_transform.split(','))] for author_transform in list_of_strings]

Everything was working fine until I encountered this piece of string:

[[country:United States,affiliation:University of Maryland, Baltimore County,name:tim oates,id:2217452330,gridid:grid.266673.0,affiliationid:79272384,order:2],........,[]]

As this string has an extra comma(,) in the middle of the intended value of affiliation key: my list is getting a spit at the wrong place. Is there a way (or idea) I can use to avoid this kind of situation? If it is not possible, any suggestions on how can I ignore thiskind of list?

abhi8569
  • 131
  • 1
  • 9
  • 2
    Have you checked out this? https://stackoverflow.com/a/988251/13991219 – Parzival Apr 14 '21 at 17:48
  • Please edit your answer to include a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example), so that we can run it without needing additional input. – Jasmijn Apr 14 '21 at 17:58
  • @Parzival Actually, the problem here is different, because the input data is *not* a string representation of a dictionary. Otherwise the problem with the ambiguous commas wouldn’t exist in the first place. Marking this question as duplicate is an error. – inof Apr 14 '21 at 23:54
  • 1
    @inof Thanks for the message. I think I added the link as a comment, I did not mark this as a duplicate – Parzival Apr 15 '21 at 00:21

1 Answers1

2

I would solve this by using a regular expression for splitting. This way you can split only on those commas that are followed by a colon without another comma in between.

In your code, replace

author_transform.split(',')

with

re.split(',(?=[^,]+:)', author_transform)

(And don’t forget to import re, of course.)

So, the whole code snippet becomes this:

author_dict = [
    [
        dict(map(str.strip, s.split(':'))
        for s in re.split(',(?=[^,]+:)', author_transform))
    ]
    for author_transform in list_of_strings
]

I took the liberty of reformatting the code, so the structure of the list comprehensions becomes clear.

inof
  • 465
  • 3
  • 7