How to convert this tuple list to this dict in a Pythonic way?

Question

I have a list of tuples

tuplist = [('person', u'English : 1, 2, 3 ; Dutch : 5, 6, 7'), ('home', u'English : 8, 9, 10; Dutch: 11, 12, 13')]

I want to transform this to this particular dict

{'person': {u'Dutch': [u'5', u'6', u'7'], u'English': [u'1', u'2', u'3']}, 'home': {u'Dutch': [u'11', u'12', u'13'], u'English': [u'8', u'9', u'10']}}

For the moment I this:

dic = dict(tuplist)
final_dic = {}
for x in dic:
    str = dic[x]
    list1 = [y.strip() for y in str.split(';')]
    subdict = {}
    for z in list1:
        list2 = [y.strip() for y in z.split(':')]
        subdict[list2[0]] = [y.strip() for y in list2[1].split(',')]
    final_dic[x] = subdict

But I would like to rewrite this to something more Pythonic. Anyone has some idea?

it isn't bad, you only have ugly meaningless variable names. More descriptive names, e.g. result instead of final_dic and nested_dictionary instead of str, would make the code much nicer and easier to read. Personally, I'd try using eval() on suitably altered strings, or (if possible) turning the data into JSON and reading that with a ready-made library. — Lorenzo Gatti, Jan 21 '14 at 17:51

Martijn Pieters · Accepted Answer · 2014-01-21T18:01:58.077

5

You can nest a set of dict and list comprehensions:

{k: {l.strip(): [n.strip() for n in nums.split(',')] 
     for i in v.split(';') 
     for l, nums in (i.split(':', 1),)}
 for k, v in tuplist}

This is quite a mouth full, so better to split out the language dictionary splitting into a generator:

def language_values(line):
    for entry in line.split(';'):
        lang, nums = entry.split(':', 1)
        yield lang.strip(), [n.strip() for n in nums.split(',')]

{k: dict(language_values(v)) for k, v in tuplist}

Either one produces the desired output:

>>> {k: {l.strip(): [n.strip() for n in nums.split(',')] 
...      for i in v.split(';') 
...      for l, nums in (i.split(':', 1),)}
...  for k, v in tuplist}
{'person': {u'Dutch': [u'5', u'6', u'7'], u'English': [u'1', u'2', u'3']}, 'home': {u'Dutch': [u'11', u'12', u'13'], u'English': [u'8', u'9', u'10']}}
>>> def language_values(line):
...     for entry in line.split(';'):
...         lang, nums = entry.split(':', 1)
...         yield lang.strip(), [n.strip() for n in nums.split(',')]
... 
>>> {k: dict(language_values(v)) for k, v in tuplist}
{'person': {u'Dutch': [u'5', u'6', u'7'], u'English': [u'1', u'2', u'3']}, 'home': {u'Dutch': [u'11', u'12', u'13'], u'English': [u'8', u'9', u'10']}}

edited Jan 21 '14 at 18:01

answered Jan 21 '14 at 17:48

Martijn Pieters

1,048,767
296
4,058
3,343

I came up with this, but was too slow! :P +1 – Games Brainiac Jan 21 '14 at 17:52
2

@Martijn Pieters : Are long one liners more pythonic? :) I would like it if the same code is written with explicit `for` loops. – Sunny Nanda Jan 21 '14 at 17:52
@SunnyNanda: No, they are not, but sometimes the draw of the challenge is strong. – Martijn Pieters Jan 21 '14 at 17:55
`nums.split()` => `nums.split(',')` in the first expression – njzk2 Jan 21 '14 at 17:56
@njzk2: Indeed, I could've sworn I had that in my tested version already. – Martijn Pieters Jan 21 '14 at 18:02
Asking for a pythonic way to tackle this may have been an ambiguous request looking at all the results. I was looking for a nice compact way to write this. Thanks Martijn Pieters, this is definitely what I had in mind! – colicab Jan 21 '14 at 23:48
Done! Thanks for bringing it to my attention. Fairly new at SO and didn't know about this feature – colicab Jan 21 '14 at 23:56

score 1 · Answer 2 · edited May 23 '17 at 10:31

Sorry, @colicab, I suppose you know a lot of these things I'm mentioning. It is just I started to write an answer to you and ended up with a more general answer inspired by your challenge. This is probably more a personal answer than most in SO but I suppose there will be a lot of things that most people will agree to be pythonic.

What is pythonic is always debatable for some extend. Let us see some tools from Python, usually considered pythonic (i.e., elegant and useful in a Python script), that can help us.

Functions (documentation)

I'd say the most pythonic thing to do is to create functions generously. For example, you will have a string with numbers separated by commas, and will need to turn it into a list of ints. We should create a function to generate the list:

def parse_numbers(numbers_string):
    pass

# Little assert to ensure it works
assert parse_numbers('1, 2, 3') == [1, 2, 3]

Ok, our function does nothing for now, but we can solve it using...

`str.split()` (documentation)

`str.strip()` (documentation)

`int()` constructor (documentation)

We can easily get a list of strings between the commas using the str.split() method:

>>> "1, 2, 3".split(',')
['1', ' 2', ' 3']

This (alone) does not solve our problem, however. Firstly, because the strings in the list has whitespaces. One can solve it by using the str.strip() method:

>>> '  1   '.strip()
'1'

We are closer to a list of ints, for sure, but not there yet. After all, '1' is a string containing a digit, not an integer number (and in Python those are very different things). Of course, we can easily solve it by using the int() constructor:

>>> int('1')
1
>>> int('  1   '.strip())
1

Now, how could we apply this operation to all strings on the list we got below?

List comprehensions (documentation)

Many times you will want to create lists from other lists. In other places, we use to create an empty list and fill it with stuff. The pythonic way, however, involves list comprehensions. For example, the line below will take each element resulting from the split, strip all the whitespaces from it and the stripped result will be converted to an int. After all, the result of these operations will be put in a new list:

>>> [int(n.strip()) for n in '1, 2  , 3   '.split(',')]
[1, 2, 3]

Applying the same logic for the numbers_string parameter, we can get the beautiful function below:

def parse_numbers(numbers_string):
    return [int(n.strip()) for n in numbers_string.split(',')]

# Little assert to ensure it works    
assert parse_numbers('1, 2, 3') == [1, 2, 3]

Simple, cohesive, clear - indeed, pythonic.

Now, what we do? We should get the language name and the list of numbers. To do it, we get back to the first answer: a function! But to make it work, we will use the very pythonic...

Sequence packing and unpacking (documentation)

Our next function will take a string like 'English : 1, 2' and return a pair suitable to be used in a dict construtor:

def parse_language(language_string):
    language, numbers_string = language_string.split(':')
    return language.strip(), parse_numbers(numbers_string)

# Little assert to ensure it works 
assert parse_language('English : 1, 2, 3') == ('English', [1, 2, 3])

We already know the strip() and the split(). The new magic is that the split(':') call will return a list with two values - and we can put them in two variables by one assignment:

language, numbers_string = language_string.split(':')

This is called unpacking and is very pythonic. Also, note that the return command is followed by two values separated by a comma. The result will be a value of the tuple type containing the two values:

>>> parse_language('English : 1, 2, 3')
('English', [1, 2, 3])

The process by which two values become an only tuple is called packing.

More functions

Only if we had only language per string... However, we have various languages per string, like in 'English : 1, 2, 3 ; Dutch : 5, 6, 7'. But we know the solution for it, right? Yes, a new function! Now, using everything we learned: split(), strip(), list comprehensions...

def split_languages(languages):
    return [language.strip() for language in languages.split(';')]

# Little assert to ensure it works    
assert (
        split_languages('English : 1, 2; Dutch : 5, 7') ==
                ['English : 1, 2', 'Dutch : 5, 7']
)

Of course, we just get only a list of strings, not a dict. It is easy to solve using the very pythonic...

The `dict` constructor (documentation, documentation)

As you may now, dicts can be either created by the {key: value ...} syntax or through the dict() constructor. The constructor has some very cool behaviors. One of them is to create dicts from a list of pairs. Consider the list below:

>>> l = [('key-1', 0), ('key-2', 'value'), ('key-3', 2)]

If we pass it to a dict constructor, we will get a dict like the one below:

>>> dict(l)
{'key-3': 2, 'key-2': 'value', 'key-1': 0}

That's why parse_language() returns a tuple: we can use it to create key-pair values. Using it with generator expressions (a kind of fancier and more efficient list comprehension) and the dict constructor, we can get all languages from a string this way:

def parse_languages(languages):
    return dict(
        parse_language(language) 
        for language in split_languages(languages)
    )

# You know, let's assure everything is until now
assert (
    parse_languages('English : 1, 2; Dutch : 5, 7') == 
            {
                    'English' : [1, 2], 
                    'Dutch' : [5, 7]
            }
)

Since each, let us say, "category" has a name (such as "person" or "home") and a "language string" parseable by parse_languages(), our next pythonic step is to use...

Another function

This new function will not have great news, actually: unpacking will be enough to save the day:

def parse_category(category_tuple):
    category, languages = category_tuple
    return category, parse_languages(languages)

# It is pythonic to test your functions, too! As you can see, asserts
# can help on it. They are not very popular, however... Go figure.
assert (
    parse_category( ('person', 'English : 1, 2; Dutch : 5, 7') ) ==
            (
                    'person',
                    {
                            'English' : [1, 2],
                            'Dutch' : [5, 7]
                    }
            )
)

Note that our parse_category() function returns a tuple. This is because we can use generator expressions plus the dict constructor to create a dict from all tuples of your input. With that, we can yield a very elegant, pythonic function:

def parse_tuples(tuples):
    return dict(parse_category(category) for category in tuples)

# No assert now. I have typed too much, I need some coffee :(

But here would we put these lot of functions? In one of the most pythonic things ever:

Modules (documentation)

In my case, I saved it all in a file named langparse.py. Now I can import it and call our parse_tuples() function:

>>> import langparse
>>> tuplist = [('person', u'English : 1, 2, 3 ; Dutch : 5, 6, 7'), ('home', u'English : 8, 9, 10; Dutch: 11, 12, 13')]
>>> langparse.parse_tuples(tuplist)
{'person': {u'Dutch': [5, 6, 7], u'English': [1, 2, 3]}, 'home': {u'Dutch': [11, 12, 13], u'English': [8, 9, 10]}}

Calling it in the terminal is of course just for test, but the sky is the limit. Since it is in a module, I can use all my functions elsewhere. Modules are so pythonic that the last line line in the Zen of Python is a homage to them:

Namespaces are one honking great idea -- let's do more of those!

Of course, not only for them, but modules are surely one of the most important namespaces in Python.

"Well, well," I can hear you wondering, "this is all cool and good but I just want to write a little (yet somewhat complicated) script! What should I do now, write another file just to call my module?" Not at all, my friend! You just need to use...

The `name == "main"` idiom

In python, you can retrieve the name of a module from the __name__ variable. It will always be the name of the module... with one exception: if the module was called directly (instead of imported), then the value of __name__ will be "__main__". For example, if I create such a simple module:

$ echo 'print(__name__)' > mymod.py

...and import it, the output will be its name:

$ python -c 'import mymod' mymod

However, if I execute mymod.py directly, the output is __main__:

$ python mymod.py
__main__

So, we can add the following lines at the end of our module. Now it will always execute the code if called directly, but never will not execute if the module is imported:

if __name__ == "__main__":
    tuplist = [
        ('person', u'English : 1, 2, 3 ; Dutch : 5, 6, 7'),
        ('home', u'English : 8, 9, 10; Dutch: 11, 12, 13')
    ]

    print parse_tuples(tuplist)

Let us see? Here is the result in my machine:

$ python langparse.py 
{'person': {u'Dutch': [5, 6, 7], u'English': [1, 2, 3]}, 'home': {u'Dutch': [11, 12, 13], u'English': [8, 9, 10]}}

Conclusion

Python has a lot of cool things that makes programming an awesome task. Those includes useful methods, modules, lists, tuples, dicts, list comprehensions (as well as generator expressions and dict generators!), unpacking... Learning all these idioms will make everything easier. However, if you have to use one and only one thing, please use functions and modules. Even if you create too much of them, it is easier to merge them than to split.

And, after all, if you have doubts about what to do, just remember our supreme guide, the Zen of Python. Yes, this is kinda humorous, but it is for real, too.

The full script can be seen in Pastebin.

That's what I like about SO, though. You have a question and people are willing to share their knowledge/experience with you. Thanks brandizzi for this comprehensive answer. This illustrates that the definition of 'pythonic' indeed has multiple explanations... — colicab, Jan 21 '14 at 23:53
@colicab I'm glad you liked it. I was afraid of sounding patronizing for explaining this amount of concepts. I was afraid of being a bit offensive by focusing this much on functions, too, specially because I've tried to be funny. (I know, not much success on that) However, it is like I've said, it became an general answer. Also, I was eager to counterbalance the excess of emphasis on oneliners in Python :) Anyway, I intent to review this question to solve problems soon. — brandizzi, Jan 21 '14 at 23:59
Definitely not. Was very glad to get such an elaborate answer. Although I might not need it anymore, it wasn't long ago that these kind of SO topics helped me a lot to understand basic and important concepts without reading through lengthy documentation. Repetitions can be omitted but you always learn some new things. I do realize that I often mix requesting for a one-liner and requesting for a Pythonic way :-) — colicab, Jan 22 '14 at 00:11

score 0 · Answer 3 · answered Jan 21 '14 at 17:45

0

def dictify(s):
    return dict((v.split(":",1)[0],v.split(":",1)[1].split(",")) for v in s.split(";"))

dict((x[0],dictify(x[1])) for x in my_tuple)

is how I would probably do it ...

answered Jan 21 '14 at 17:45

Joran Beasley

110,522
12
160
179

How to convert this tuple list to this dict in a Pythonic way?

3 Answers3