3

I'm looking for a more pythonic way to get the maximal length of the values for each key in a list of dictionaries.

My approach looks like this

lst =[{'a':'asdasd', 'b': 123},{'a': 'asdasdasdas'}, {'a':123,'b':'asdasd'}]
dct = {}
for l in lst:
    for key in l:
        dct.update({key: max(dct.get(key,0), len(str(l.get(key,0))))})
print(dct)

The output gives

{'b': 6, 'a': 11}

The str function is needed to get the length of integers (and also Nones)

Is this approach "pythonic" or is there a smoother, more readable way using list comprehensions or similar methods.

Alex
  • 21,273
  • 10
  • 61
  • 73
Quickbeam2k1
  • 5,287
  • 2
  • 26
  • 42
  • 1
    is 123 supposed to be an int or string? – SirParselot Nov 10 '15 at 16:44
  • are all the keys in each of the dicts in lst the same? – Chad S. Nov 10 '15 at 16:45
  • @SirParselot, I don't know about ints. However, I want to treat Nones -> i used the str function. @ Chad S. The second dict in lst does not have the same keys as the others. However, if you have a solution that uses this assumption this is also fine – Quickbeam2k1 Nov 10 '15 at 16:53
  • 1
    I would argue that your example (and all given answers) fail to be pythonic because although they use python idioms they are not "clear": http://stackoverflow.com/questions/25011078/what-does-pythonic-mean - perhaps you could be clearer about your desire to be pythonic – James Harrison Nov 10 '15 at 17:21
  • I quote form your link: "So essentially when someone says something is unpythonic, they are saying that the code could be re-written in a way that is a better fit for pythons coding style." I'm looking for such a solution. Since, I'm new to python, I want to learn more about the efficient and clear ways to code. For example my code is not very readable. Especially the update and max part might be confusing. – Quickbeam2k1 Nov 10 '15 at 17:28
  • @Quickbeam2k1 - http://legacy.python.org/dev/peps/pep-0020/ - if you want readability, both answers so far ignore that in favour of 'compactness'. This is a case where over-using python features makes things less readable. – James Harrison Nov 10 '15 at 17:31
  • So what do you suggest? From "Flat is better than nested", i infer that my answer also not follow the Zen of python. Furthermore, what about "There should be one-- and preferably only one --obvious way to do it." Would this be the "most" pythonic solution?(However, it might look like) – Quickbeam2k1 Nov 10 '15 at 17:44
  • "Flat is better than nested" - things can be nested on a single line - this is much worse than just using a small amount of indentation. A good measure of the nestedness of your code is the number of sequential closing brackets. – James Harrison Nov 10 '15 at 18:24
  • To be honest: I don't know which answer to choose :) I think each one has its advantages in some ways. I'll think about it or let the community decide via voting :) – Quickbeam2k1 Nov 12 '15 at 21:51

5 Answers5

2

I think your approach is fairly Pythonic except that I would change the update line to be a little more clear:

# A little terse
dct.update({key: max(dct.get(key,0), len(str(l.get(key,0))))})
# A little simpler
dct[key] = max(dct.get(key, 0), len(str(l[key])))

Here's a solution with variable names modified as well:

dict_list =[{'a':'asdasd', 'b': 123},{'a': 'asdasdasdas'}, {'a':123,'b':'asdasd'}]
max_lengths = {}
for dictionary in dict_list:
    for k, v in dictionary.items():
        max_lengths[k] = max(max_lengths.get(k, 0), len(str(v)))
print(max_lengths)
Trey Hunner
  • 10,975
  • 4
  • 55
  • 114
1

My previous answer was wrong and did not realize but here are two others that do work. The first one uses pandas. It creates a dataframe, sorts the keys then the values, takes the first value of each group, and then creates a dictionary out of that

import pandas as pd
lst = [{'a':'asdasd', 'b': 123},{'a': 'asdasdasdas'}, {'a':123,'b':'asdasd'}]
dct={}

d = pd.DataFrame([(k,len(str(v))) for i in lst for k,v in i.items()], columns=['Key','Value'])
d = d.sort(['Key','Value'], ascending=[1,0])
d = d.groupby('Key').first().reset_index()
d = dict(zip(d.Key, d.Value))  #or d.set_index('Key')['Value'].to_dict()
print d

{'a': 11, 'b': 6}

if you want something that is easily readable and uses the built-in modules then this should do

lst = [{'a':'asdasd', 'b': 123},{'a': 'asdasdasdas'}, {'a':123,'b':'asdasd'}]
dct={}

for i in lst:
    for k,v in i.items():
        if k in dct:
            if len(str(v)) > dct[k]:
                dct[k] = len(str(v))
        else:
            dct[k] = len(str(v))
print dct

{'a': 11, 'b': 6}
SirParselot
  • 2,640
  • 2
  • 20
  • 31
  • So if I compare your answer to my suggestion, the essential difference is that my update-construct is replaced by a clever if-else construct, right? – Quickbeam2k1 Nov 11 '15 at 19:48
  • @Quickbeam2k1 Essentially yes, it still loops through the list but gets the keys and values of the dictionaries and replaces your one-liner with an easy to read if-else statement – SirParselot Nov 11 '15 at 20:17
1

Here's another way that doesn't rely on sorting/zipping but I wouldn't say one is more Pythonic than the other.

from itertools import chain

lst =[{'a':'asdasd', 'b': 123}, {'a': 'asdasdasdas'}, {'a':123,'b':'asdasd'}]
dct = {
    k: max(len(str(d.get(k, ""))) for d in lst)
    for k in set(chain.from_iterable(d.keys() for d in lst))
}

print(dct)

Alternatively, you can use groupby:

from itertools import chain, groupby

lst =[{'a':'asdasd', 'b': 123}, {'a': 'asdasdasdas'}, {'a':123,'b':'asdasd'}]
dct = {
    k: max(len(str(v)) for _, v in g)
    for k, g in groupby(
        chain.from_iterable(d.items() for d in lst),
        lambda p: p[0]
    )
}

print(dct)
Steven
  • 5,654
  • 1
  • 16
  • 19
1

The other answers focus on using python features rather than readability. Personally I'm of the opinion that readability and simplicity are the most important of all the 'pythonic' traits.

(I simplified to use strings for everything, but it would work with integers as well if you drop in a str())

from collections import defaultdict

lst =[{'a':'asdasd', 'b': '123'},{'b': 'asdasdasdas'}, {'a':'123','b':'asdasd'}]

def merge_dict(dic1,dic2) :
    for key,value in dic2.items():
            dic1[key].append(value)

combined = defaultdict(list)
for dic in lst:
    merge_dict(combined, dic)

print( {key : max(map(len,value)) for key, value in combined.items() } )
  • The first step is just about merging the dicts into a 'multi value map' - the final line does all the actual logic requested – James Harrison Nov 10 '15 at 18:26
  • My second answer only uses built in modules and is easily readable. – SirParselot Nov 10 '15 at 19:07
  • @SirParselot - I agree the second answer is fairly readable. While the first answer is clever, it fails on many of the other pythonic traits. If the second answer was on it's own, I would have upvoted that. – James Harrison Nov 11 '15 at 09:11
1

I like this take for readability and use of Python as such:

dicts = [{'a':'asdasd', 'b': 123},{'a': 'asdasdasdas'}, {'a':123,'b':'asdasd'}]

def get_highest(current_highest, items_left):
    if not items_left:
        return current_highest
    else:
        item = items_left.pop()
        higher = {key: len(str(value)) for key, value in item.items() if (len(str(item[key])) > current_highest.get(key, 0))}
    if higher:
        current_highest.update(higher)
    return get_highest(current_highest, items_left)

print(get_highest(dict(), dicts))

{'b': 6, 'a': 11}
JoGr
  • 1,457
  • 11
  • 22
  • 1
    I'm not saying this is the most 'pythonic', it's just descriptive for my eyes. I think Trey Hunners second example is the best example of readability while being short though. – JoGr Nov 13 '15 at 09:22