5

What the normal style for data objects in python. Lets say I have a method that gets a customer from somewhere (net, DB, ....) what type of object should I return. I see several choices:

  • a tuple
  • a dictionary
  • a class instance (are data classes 'normal')

I am sure there are others. Doing my first big python project and so would like to start out using best practices for the start

Wow - surprised at the negative reaction to the question. Maybe not being clear

I Have lots of different data items i want to pass around my code. User, product, customer, order,... (in fact they are nothing like that but the its simpler with obvious types of thing). so I have

def get_user():
  return x

what should x be. an instance of class called user, a dict, a tuple...

there are no methods associated with the objects, pure data

seems like namedtuples are the way to go

edit: How about this as a style

class product:
   pass

...

def get_product():
   ... db read stuff
   pr = product()
   pr.name = dbthing[0]
   pr.price = dbthing[1]
   return pr

Is that barf inducing or well established style or odd or what? It works. From the consumer side it makes for readable code

def xxx():
  pr = get_product()
  total = amount * pr.price
pm100
  • 48,078
  • 23
  • 82
  • 145
  • What on earth is a "data object"? Objects are supposed to model the real world, and in the real world I have never encountered such a thing. Also, I can't imagine an entire app using only one data structure for every return value for everything. Why would you do that? Use whatever makes sense for every function or method you write. – Two-Bit Alchemist Apr 10 '14 at 23:10
  • Plus why would you even separate these concepts? A class can contain dicts which contain lists of dicts that contain tuples. You will find many times you are using tuples in Python implicitly when you don't even mean to be, like whenever you do multiple assignment, or return several values from a function. – Two-Bit Alchemist Apr 10 '14 at 23:11
  • @Two-BitAlchemist: I frequently encounter objects in the real world with no behaviour worth speaking of. The most obviously close analogy to what the questioner is talking about is a printed paper form with some information filled in. Of course, other objects' behaviour will be affected by what's on the form. – Steve Jessop Apr 10 '14 at 23:11
  • @SteveJessop I wouldn't call a partially filled in form a "data object". I might call it, e.g., a [form](https://docs.djangoproject.com/en/dev/topics/forms/#form-objects)... – Two-Bit Alchemist Apr 10 '14 at 23:13
  • @Two-BitAlchemist: you said *the real world*. In the real world, a piece of paper does very few of the things that a form on a website does. Your link is not relevant. Of course, if you didn't really mean that objects should model the real world, rather they should model computer UIs, then that's a reasonable position too but please don't say you've *never encountered* printed paper even if you think it's too stupid to use ;-) – Steve Jessop Apr 10 '14 at 23:14
  • if there is methods make it a class .... if it is just named data use a dictionary ... if the order of the data is important use a tuple or list ... – Joran Beasley Apr 10 '14 at 23:14
  • @SteveJessop Look at the OP's question. "Let's say I have a method that gets a customer from somewhere (net, DB)... What kind of object should it return?" A Customer object. Whether that's implemented as a class or just sort of represented as a dict or a tuple or an encoded string really doesn't matter. As someone who works with Python and Django every day IMO if you want "data objects" maybe take a look at Java. – Two-Bit Alchemist Apr 10 '14 at 23:17
  • @SteveJessop I don't think what's important about the form in your example is that it's a piece of paper. What do you think web forms are supposed to be modeling anyway?! – Two-Bit Alchemist Apr 10 '14 at 23:18
  • @Two-BitAlchemist: If it's a printed piece of paper with my name and address filled into a form, like a packing slip, then web forms aren't modelling it terribly accurately. Maybe objects *shouldn't* model the real world, they should model something better than the real world because so many real-world constraints can be ignored if you choose. The real paper, for example. – Steve Jessop Apr 10 '14 at 23:26

3 Answers3

2

For simple data records you should generally think about collections.namedtuple first, and use one of the other options if that's not suitable for any reason. Once you know why it's not suitable that generally suggests which of the others to use. You can think of collections.namedtuple as being a shortcut for quickly defining immutable "data classes".

Taking a database as an example, if you're using an ORM then even the simplest records will be represented as objects[*], and with good reason because they will all have in common some actions you can perform on them such as storing changes back to the database. The tutorial/documentation for your ORM will guide you. If you're using the Python db API directly then rows from SQL queries will come back (initially) as tuples, but of course you can do what you like with those once you have them. Also the database connector can provide ways to manipulate them before your call to execute() returns them, for example setting the row factory in sqlite3.

Taking "the net" as an example -- well, there are many means of data interchange, but one common example is accessing an API that returns JSON data. In that case there's not much choice but to initially represent this data in your program the same way that it was structured as JSON: as lists and dictionaries containing lists, dictionaries, strings and numbers. Again, you can do what you like with this once you have it. It's fairly normal to work with it as it is, it's also fairly normal to get it re-structured into something else straight away. Both personal preference and the particular circumstances affect which you actually choose.

You should certainly think of all of these things as available options. Generally speaking you would use:

  • a tuple when each position has its own meaning. That's why Python returns "multiple values" from a function by returning a tuple, because the different things might be completely different types with different meaning.
  • a dictionary when the available keys vary by record.
  • a data class when the keys are the same for every record. You can use namedtuple for immutable records.
  • a list or tuple when the order is important but the positions are all equivalent. Observe that there's a little tension here between "tuples are immutable, lists are mutable" vs. "tuples are for heterogeneous data and lists are for homogeneous data". Personally I lean towards the former but I've seen sensible arguments for the latter, so if you're asking how people in general make the choice you can't ignore that.

[*] well, tuples and dictionaries are objects too of course, I mean objects other than these data structures ;-)

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
1

I interpret "data object" as an immutable object usually with a few fields.

One option you see a lot is to just use standard dictionaries with the fields as keys. But personally I don't like that, as software grows bigger it can be hard to see exactly what keys exist and where they come from. People start writing functions that add new keys to existing dictionaries, and it all turns into a mess.

Your empty product class looks a bit bizarre to me -- if you're going to do that, pass the values into the constructor and let that set the attributes. Then it's about the most normal way to do it -- a simple class with some attributes and nothing else.

But namedtuples are cooler because they're immutable, so as you read the code you don't have to worry that some field changes somewhere:

from collections import namedtuple

Product = namedtuple('Product', 'name price')

p = Product("some product", 10)

But now you want to add functionality to it, say a __unicode__ method that returns a description of the product and its price. You can now turn it into a normal class again, with the constructor taking these same arguments. But you can also subclass a namedtuple:

class Product(namedtuple('Product', 'name price')):
    def __unicode__(self):
        return "{} (${})".format(self.name, self.price)

And it's still immutable. That's what I do when I need a pure data object. If you ever need it to become a mutable class, make one of the attributes something mutable, or turn it into a normal class with the same interface after all.

RemcoGerlich
  • 30,470
  • 6
  • 61
  • 79
0

Either namedtuples or classes work. It really depends on what else you are doing and if you are providing an API to others. Classes often make for kinder APIs plus you can put in more clear access points.

As for your code example at the end it would be better to have the class take the arguments during construction where possible. This way you return a real, usable object from "birth" instead of an empty shell waiting to be filled. Again, from an API perspective if the class cannot be used without having a few values set then there should be no way to make an instance of the class without those values already set. Otherwise you end up writing annoying "if not set: then fail" in a bunch of places to safe guard things. This is more true in languages like C++ or Java but it is still a good style and mode of operation to adhere to.

Sean Perry
  • 3,776
  • 1
  • 19
  • 31