1

I want to transform dictionary into a string. What would be beginner-level question is complicated by few rules that I have to adhere to:

  • There is a list of known keys that must come out in particular, arbitrary order
  • Each of known keys is optional, i.e. it may not be present in dictionary
  • It is guaranteed that at least one of known keys will be present in dictionary
  • Dictionary may contain additional keys; they must come after known keys and their order is not important
  • I cannot make assumptions about order in which keys will be added to dictionary

What is the pythonic way of processing some dictionary keys before others?

So far, I have following function:

def format_data(input_data):
    data = dict(input_data)
    output = []
    for key in ["title", "slug", "date", "modified", "category", "tags"]:
        if key in data:
            output.append("{}: {}".format(key.title(), data[key]))
            del data[key]

    if data:
        for key in data:
            output.append("{}: {}".format(key.title(), data[key]))

    return "\n".join(output)
data = {
    "tags": "one, two", 
    "slug": "post-title",
    "date": "2017-02-01",
    "title": "Post Title",
}

print(format_data(data))

data = {
    "format": "book",
    "title": "Another Post Title",
    "date": "2017-02-01",
    "slug": "another-post-title",
    "custom": "data",
}

print(format_data(data))

Title: Post Title
Slug: post-title
Date: 2017-02-01
Tags: one, two

Title: Another Post Title
Slug: another-post-title
Date: 2017-02-01
Custom: data
Format: book

While this function does provide expected results, it has some issues that makes me think there might be better approach. Namely, output.append() line is duplicated and input data structure is copied to allow it's modification without side-effects.

To sum up, how can I process some keys in particular order and before other keys?

smci
  • 32,567
  • 20
  • 113
  • 146
Mirek Długosz
  • 4,205
  • 3
  • 24
  • 41
  • 1
    I don't see anything particularly wrong with your implementation. The `if data:` is unnecessary, but that's about it. – glibdud Feb 01 '17 at 18:54
  • 1
    Do you actually want to `del data[key]` as you iterate (which is bad practice) or do you just do that to ensure the known keys don't get iterated over twice? – smci Feb 01 '17 at 18:54
  • There's also no need to have `dict(input_data)`, unless there's a chance the input isn't already a dictionary (which is not represented in your example). – skrrgwasme Feb 01 '17 at 18:55
  • 2
    @skrrgwasme (and also smci): `data = dict(input_data)` is there to prevent `del data[key]` from modifying the input. – user2357112 Feb 01 '17 at 18:57
  • @user2357112 That's a good point that I didn't see. – skrrgwasme Feb 01 '17 at 18:59
  • @smci: `del` is there to ensure that known keys don't get iterated over twice. – Mirek Długosz Feb 01 '17 at 19:52
  • @MirosławZalewski: I know why you're doing that, I'm saying it's bad practice to delete/modify as you iterate. Better for your second loop to iterate over `set(d.keys()) - set(['title', 'slug', 'date', 'modified', 'category', 'tags'])` – smci Feb 02 '17 at 03:32
  • @smci - items are Not being deleted from the thing that is being iterated over. – wwii Feb 02 '17 at 04:20
  • @wwii: Yes they are. `if key in data: ... del data[key]` is asking for trouble. It's not thread-safe. Taking the copy `data = dict(input_data)` is unnecessary bloat just to prevent deleting from the actual input, let's assume we already got rid of that line. – smci Feb 02 '17 at 04:26
  • @user2357112: Taking the copy `data = dict(input_data)` is kludgy and using 2x memory. – smci Feb 02 '17 at 04:28
  • @smci ```if key in data:``` is a conditional not an iteration. – wwii Feb 02 '17 at 04:33
  • @smci: It's only a shallow copy. It doesn't need to copy the key or value objects, so it's much less than 2x memory. – user2357112 Feb 02 '17 at 04:38
  • @wwii: `del data[key]` is an operation. It's inside the iteration `for key in ["title"...]` It will remove `key` from the keys of `data`. Doing this inside the iteration is not thread-safe and is bad coding style. Especially in cases as here where it's avoidable. – smci Feb 02 '17 at 04:50
  • @smci: Not *thread*-safe? What? Even if there were other threads running, `data` is only visible to a single thread. I don't see any thread-safety concerns with `del data[key]`. – user2357112 Feb 02 '17 at 04:54
  • @user2357112: as I already wrote you above **Taking the copy `data = dict(input_data)` is unnecessary bloat just to prevent deleting from the actual input, let's assume we already got rid of that line**. There is zero reason why we can't operate on `input_data` directly. (I take it we're supposed to be addressing the intent of the code, not the code as written). – smci Feb 02 '17 at 04:57
  • @smci: You can't criticize the code for bugs you introduced! You can't take out the safety measures and then criticize the code for not being safe. It's not even a substantial degree of memory bloat - the memory consumption is comparable to that involved in your proposed `set(d.keys()) - set(['title', 'slug', 'date', 'modified', 'category', 'tags'])` computation. – user2357112 Feb 02 '17 at 05:07
  • @user2357112: I didn't "introduce" a bug. The code bloats memory and probably leaks memory due to taking an unnecessary copy. It's possible to avoid those so we don't end up debating the lesser of the evils. We don't know that the bloat isn't worse, perhaps the OP modifies some of the items in the dict copy. Defining a `CustomOrderedDict` and overriding `__iter__()` seems like the solution. That sort of thing has been asked before: **[Python: How to “perfectly” override a dict](http://stackoverflow.com/questions/3387691/python-how-to-perfectly-override-a-dict)**. Except for OrderedDict. – smci Feb 02 '17 at 05:17
  • @smci: "probably leaks memory" - there is no way this code leaks memory. "perhaps the OP modifies some of the items in the dict copy" - we can see everything they're doing with the copy, and they're not doing that. "Defining a CustomOrderedDict and overriding __iter__() seems like the solution" - that would be absurdly overengineered for a use case as simple as this, and it just shoves the problem into the `__iter__` method. – user2357112 Feb 02 '17 at 05:48

4 Answers4

2

I suggest that you simply run a pair of list comprehensions: one for the desired keys, and one for the rest. Concatenate them in the desired order in bulk, rather than one at a time. This reduces the critical step to a single command to build output.

The first comprehension looks for desired keys in the dict; the second looks for any dict keys not in the "desired" list.

def format_data(input_data):
    data = dict(input_data)
    key_list = ["title", "slug", "date", "modified", "category", "tags"]
    output = ["{}: {}".format(key.title(), data[key]) for key in key_list if key in data] + \
             ["{}: {}".format(key.title(), data[key]) for key in data if key not in key_list]
    return "\n".join(output)
Prune
  • 76,765
  • 14
  • 60
  • 81
0

To completely edit, the below will take a list of primary keys (you can pass them in if you want or set it in a config file) and then it will set those in the beginning of your dictionary.

I think I see what you mean now:

Try this:

from collections import OrderedDict
data = {'aaa': 'bbbb',
 'custom': 'data',
 'date': '2017-02-01',
 'foo': 'bar',
 'format': 'book',
 'slug': 'another-post-title',
 'title': 'Another Post Title'}

def format_data(input_data):
    primary_keys = ["title", "slug", "date", "modified", "category", "tags"]
    data = OrderedDict((k, input_data.get(k)) for k in primary_keys + input_data.keys())
    output = []
    for key, value in data.items():
        if value:
            output.append("{}: {}".format(key.title(), value))
    return "\n".join(output)

print(format_data(data))

Title: Another Post Title
Slug: another-post-title
Date: 2017-02-01
Aaa: bbbb
Format: book
Custom: data
Foo: bar
Kelvin
  • 1,357
  • 2
  • 11
  • 22
  • I like the `OrderedDict` idea, but you shouldn't delete things from collections while iterating over them. Build a list of the keys first, then you can freely modify the dictionary while iterating over the list. In fact, smci's comment on the question is getting at a good point - the delete was probably there just to prevent iterating over things twice. It may not be necessary any more with an ordered dict. – skrrgwasme Feb 01 '17 at 18:53
  • Thanks for an answer. However, I cannot make assumptions about order in which keys will be added to input dictionary (I read source file line-by-line and order in source is random). I think I can create template OrderedDict will all known keys added in expected order, but then again - source file may be missing some of known keys. I have added these requirements to my question - can you try and update your answer to solve these problems as well? – Mirek Długosz Feb 01 '17 at 20:02
0

I'd suggest list comprehensions and pop():

def format_data(input_data):
    data = dict(input_data)

    keys = ["title", "slug", "date", "modified", "category", "tags"]
    output = ['{}: {}'.format(key.title(), data.pop(key)) for key in keys if key in data]

    output.extend(['{}: {}'.format(key.title(), val) for key, val in data.items()])

    return "\n".join(output)

To the concern about deleting during iteration - note that the iteration is over the list of keys, not the dictionary being evaluated, so I wouldn't consider that a red flag.

bimsapi
  • 4,985
  • 2
  • 19
  • 27
0

Find the difference between the known keys and the keys in the input dictionary; Use itertools.chain to iterate over both sets of keys; catch KeyErrors for missing keys and just pass. No need to copy the input and no duplication.

import itertools
def format_data(input_data):
    known_keys = ["title", "slug", "date", "modified", "category", "tags"]
    xtra_keys = set(input_data.keys()).difference(known_keys)
    output = []
    for key in itertools.chain(known_keys, xtra_keys):
        try:
            output.append("{}: {}".format(key.title(), data[key]))
        except KeyError as e:
            pass
    return '\n'.join(output)

data = {"tags": "one, two",
        "slug": "post-title",
        "date": "2017-02-01",
        "title": "Post Title",
        "foo": "bar"}

>>> print format_data(data)
Title: Post Title
Slug: post-title
Date: 2017-02-01
Tags: one, two
Foo: bar
>>>
wwii
  • 23,232
  • 7
  • 37
  • 77