1

I have a daily process that writes a json file:

{
  "oldfield1": 1,
  "oldfield2": "a",
}

and a python script that reads these files, each into a single object

import collections
import json
import os
MyRecord = collections.namedtuple("MyRecord",["oldfield1","oldfield2"])
mydata = {}
for fname in os.listdir("mydir"):
    with open(fname) as fd:
        mydata[fname] = MyRecord(**json.load(fd))

Tempora mutantur, and newer files now have an extra field:

{
  "oldfield1": 3,
  "oldfield2": "f",
  "newfield": [1,2,3],
}

and now the code above fails with this error:

TypeError: __new__() got an unexpected keyword argument 'newfield'

I can add newfield to MyRecord, but then it will fail on the old files.

What is the best approach?

  1. I can add newfield to MyRecord and set MyRecord.__new__.__defaults__.

  2. I can sanitize the dict.

Priorities:

  1. I want to do as little as possible: minimize my code modifications (this implies option 1).
  2. I also want to minimize future code modifications: when "futurefield" is added, I want, ideally, not to have to do anything (this implies option 2).
  3. Most important, I want to keep code maintainable (option 1?)

Personally, I like the first approach better. However, I would love to hear what people think.

sds
  • 58,617
  • 29
  • 161
  • 278
  • option 3: use bunch, which is an object that just exposes dict keys as attributes. it's pretty convenient, and will work in both cases. it will probably be significantly slower though than the existing implementation (at least the recipes I've used - adding some cython might improve it). – Corley Brigman Jul 07 '17 at 14:54
  • option 4: subclass namedtuple to change the init to throw away any input keys that it doesn't recognize. this probably has the same caveat that it will likely be slower (namedtuple is mostly C by now as far as i know, the subclass will be mostly Python). may not be that much slower though, since you are only overriding the init. – Corley Brigman Jul 07 '17 at 14:55
  • @CorleyBrigman: thanks, I don't want to install an extra package. let me edit the question to clarify my priorities. – sds Jul 07 '17 at 15:30
  • subclassing namedtuple wouldn't require installing any extra packages, just writing some code. same for writing a small subclass of dict that just implements `__getattr__` and reflects to `__getitem__`... – Corley Brigman Jul 07 '17 at 15:48

0 Answers0