7

I am using Pyramid as a basis for transfer of data for a turn-based video game. The clients use POST data to present their actions, and GET to retrieve serialized game board data. The game data can sometimes involve strings, but is almost always two integers and two tuples:

gamedata = (userid, gamenumber, (sourcex, sourcey), (destx, desty))

My general client side framework was to Pickle , convert to base 64, use urlencode, and submit the POST. The server then receives the POST, unpacks the single-item dictionary, decodes the base64, and then unpickles the data object.

I want to use Pickle because I can use classes and values. Submitting game data as POST fields can only give me strings.

However, Pickle is regarded as unsafe. So, I turned to pyYAML, which serves the same purpose. Using yaml.safe_load(data), I can serialize data without exposing security flaws. However, the safe_load is VERY safe, I cannot even deserialize harmless tuples or lists, even if they only contain integers.

Is there some middle ground here? Is there a way to serialize python structures without at the same time allowing execution of arbitrary code?

My first thought was to write a wrapper for my send and receive functions that uses underscores in value names to recreate tuples, e.g. sending would convert the dictionary value source : (x, y) to source_0 : x, source_1: y. My second thought was that it wasn't a very wise way to develop.

edit: Here's my implementation using JSON... it doesn't seem as powerful as YAML or Pickle, but I'm still concerned there may be security holes.

Client side was constructed a bit more visibly while I experimented:

import urllib, json, base64

arbitrarydata = { 'id':14, 'gn':25, 'sourcecoord':(10,12), 'destcoord':(8,14)}

jsondata = json.dumps(arbitrarydata)
b64data = base64.urlsafe_b64encode(jsondata)
transmitstring = urllib.urlencode( [ ('data', b64data) ] )
urllib.urlopen('http://127.0.0.1:9000/post', transmitstring).read()

Pyramid Server can retrieve the data objects:

json.loads(base64.urlsafe_b64decode(request.POST['data'].encode('ascii')))

On an unrelated note, I'd love to hear some other opinions about the acceptability of using POST data in this method, my game client is in no way browser based at this time.

John
  • 173
  • 2
  • 9
  • 1
    Indeed, you should *not* **ever** use pickles for untrusted data. See http://www.zopatista.com/plone/2007/11/09/one-cookie-please/ – Martijn Pieters Oct 07 '12 at 16:50
  • @MartijnPieters I understand the incredible usefulness of that stack language... ...in one per million projects. Isn't there some subset of pickle that doesn't allow those particular operatives? It seems many have a huge headache to get around a terrible flaw in an incredibly useful system. – John Oct 07 '12 at 17:22
  • No, there isn't. Arbitrary python objects means an attacker can construct something that'll allow access. For instance, there are powerful basic servers available in the Python stdlib that'd open up ports when instanciated. – Martijn Pieters Oct 07 '12 at 19:15

3 Answers3

4

Why not use colander for your serialization and deserialization? Colander turns an object schema into simple data structure and vice-versa, and you can use JSON to send and receive this information.

For example:

import colander

class Item(colander.MappingSchema):
    thing = colander.SchemaNode(colander.String(),
                                validator=colander.OneOf(['foo', 'bar']))
    flag = colander.SchemaNode(colander.Boolean())
    language = colander.SchemaNode(colander.String()
                                   validator=colander.OneOf(supported_languages)

class Items(colander.SequenceSchema):
    item = Item()

The above setup defines a list of item objects, but you can easily define game-specific objects too.

Deserialization becomes:

    items = Items().deserialize(json.loads(jsondata))

and serialization is:

    json.dumps(Items().serialize(items))

Apart from letting you round-trip python objects, it also validates the serialized data to ensure it fits your schema and hasn't been mucked about with.

Tom
  • 42,844
  • 35
  • 95
  • 101
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
3

How about json? The library is part of the standard Python libraries, and it allows serialization of most generic data without arbitrary code execution.

Amber
  • 507,862
  • 82
  • 626
  • 550
0

I don't see raw JSON providing the answer here, as I believe the question specifically mentioned pickling classes and values. I don't believe using straight JSON can serialize and deserialize python classes, while pickle can.

I use a pickle-based serialization method for almost all server-to-server communication, but always include very serious authentication mechanisms (e.g. RSA key-pair matching). However, that means I only deal with trusted sources.

If you absolutely need to work with untrusted sources, I would at the very least, try to add (much like @MartijnPieters suggests) a schema to validate your transactions. I don't think there is a good way to work with arbitrary pickled data from an untrusted source. You'd have to do something like parse the byte-string with some disassembler and then only allow trusted patterns (or block untrusted patterns). I don't know of anything that can do this for pickle.

However, if your class is "simple enough"… you might be able to use the JSONEncoder, which essentially converts your python class to something JSON can serialize… and thus validate…

How to make a class JSON serializable

The impact is, however, you have to derive your classes from JSONEncoder.

Community
  • 1
  • 1
Mike McKerns
  • 33,715
  • 8
  • 119
  • 139