16

I am working on a program in Python and want users to be able to save data they are working on. I have looked into cPickle; it seems like it would be a fast and easy way to save data, it seems insecure. Since entire functions, classes, etc can be pickled, I am worried that a rogue save file could inject harmful code into the program. Is there a way I can prevent that, or should I look into other methods of saving data, such as directly converting to a string (which also seems insecure,) or creating an XML hierarchy, and putting data in that.

I am new to python, so please bear with me.

Thanks in advance!

EDIT: As for the type of data I am storing, it is mainly dictionaries and lists. Information such as names, speeds, etc. It is fairly simple right now, but may get more complex in the future.

prattmic
  • 423
  • 1
  • 4
  • 10
  • 2
    How do you mean insecure? Unless you're storing some dynamic code in those dictionaries and lists, then there is no way that harmful code can be injected into your program by modifying the pickled files. Pickling should be fine for your requirements. – Il-Bhima Sep 07 '09 at 15:07
  • 5
    @Il-Bhima, pickle is insecure. Check the pickle doc page. It says: " Warning The pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source. " – Nadia Alramli Sep 07 '09 at 15:11
  • There's no **upload** of pickle files, right? If there's no upload, there's no security concern at all, right? – S.Lott Jan 18 '12 at 10:55

7 Answers7

23

From your description JSON encoding is the secure and fast solution. There is a json module in python2.6, you can use it like this:

import json
obj = {'key1': 'value1', 'key2': [1, 2, 3, 4], 'key3': 1322}
encoded = json.dumps(obj)
obj = json.loads(encoded)

JSON format is human readable and is very similar to the dictionary string representation in python. And doesn't have any security issues like pickle. If you don't have python2.6 you can install cjson or simplejson

You can't use JSON to save python objects like Pickle. But you can use it to save: strings, dictionaries, lists, ... It can be enough for most cases.

To explain why pickle is insecure. From python docs:

Most of the security issues surrounding the pickle and cPickle module involve unpickling. There are no known security vulnerabilities related to pickling because you (the programmer) control the objects that pickle will interact with, and all it produces is a string.

However, for unpickling, it is never a good idea to unpickle an untrusted string whose origins are dubious, for example, strings read from a socket. This is because unpickling can create unexpected objects and even potentially run methods of those objects, such as their class constructor or destructor ... The moral of the story is that you should be really careful about the source of the strings your application unpickles.

There are some ways to defend yourself but it is much easier to use JSON in your case.

Nadia Alramli
  • 111,714
  • 37
  • 173
  • 152
  • Do you suggest that Pickle is less secure? – u0b34a0f6ae Sep 07 '09 at 15:01
  • 5
    Of course it is less secure: "The pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source." from python docs http://docs.python.org/library/pickle.html – Nadia Alramli Sep 07 '09 at 15:03
  • I suspect the OP is more worried about the integrity of the application, that it continues to work bug-free, rather than black hat intrusion. – u0b34a0f6ae Sep 07 '09 at 15:06
  • 6
    @kaizer.se: The OP is worries about security. He said: "I am worried that a rogue save file could inject harmful code into the program." – Nadia Alramli Sep 07 '09 at 15:12
3

You could do something like:

to write

  • Pickle
  • Sign pickled file
  • Done

to read

  • Check pickled file's signature
  • Unpickle
  • Use

I wonder though what makes you think that the data files are going to be tampered but your application is not going to be?

Vinko Vrsalovic
  • 330,807
  • 53
  • 334
  • 373
  • Encryption does not protect against tampering/injected data. Signing does, but the read problem is hiding the keys. So the best solution may be sum kind of checksum or hash value appated to the file. – Ber Sep 07 '09 at 15:06
  • I wonder how would you effectively insert meaningful data in an encrypted file, you'd just invalidate the file, but you're right. – Vinko Vrsalovic Sep 07 '09 at 15:22
2

*****In this answer, I'm only concerned about accidental corruption of the application's integrity.*****

Pickle is "secure". What might be insecure is accessing code you didn't write, for example in plugins; that is not relevant to pickles though.

When you pickle an object, all its data is saved, but code and implementation is not. This means when unpickled, an updated object might find it has "old-style" data inside (if you update the implementation). This is something you must know and handle, if applicable.

Pickling strings, lists, numbers, dicts is very easy and works perfectly, and comparably to JSON. The Pickle magic is that -- sometimes without adjustment -- even complex python objects can be pickled. But only data is pickled; the instances are reconstructed simply by the saved module name and type name of the object.

u0b34a0f6ae
  • 48,117
  • 14
  • 92
  • 101
  • 3
    You can modify pickled data to reference "eval", and thus run code. So, loading an untrusted pickle is as bad as running untrusted code. – mthurlin Sep 07 '09 at 15:41
  • truppo: Well, I understand that it could create any type of object. However, I'm only concerned about accidental corruption of the application's integrity. – u0b34a0f6ae Sep 07 '09 at 15:46
1

You need to give us more context before we can answer: what type of data are you saving, how much is there, how do you want to access it?

As for pickles: they do not store code. When you pickle a function or class, it is the name that is stored, not the actual code itself.

Ned Batchelder
  • 364,293
  • 75
  • 561
  • 662
1

You should use a database of some kind. Storing in pickle format isn't a good idea (in most cases). You may consider:

  • SQLite - (included in Python 2.5+) fast and simple, but requires knowledge of SQL and DB-API
  • buzhug - non-SQL, file based database with pythonic syntax
  • SQL database - you may use interface to some of DBMS (like MySQL, PostreSQL etc.), but it's only good for larger amount of data (thousands of records).

You may find some other solutions here.

Tupteq
  • 2,986
  • 1
  • 21
  • 30
1

Who -- specifically -- is the sociopath who's going through the effort to break a program by hacking the pickled file?

It's Python. The sociopath has your source. They don't need to fool around hacking your pickle file. They can just edit your source and do all the "damage" they want.

Don't worry about "insecurity" unless you're involved in litigation with organized crime syndicates.

Don't worry about "a rogue save file could inject harmful code into the program". No one will bother with a rogue save file when they have the source.

S.Lott
  • 384,516
  • 81
  • 508
  • 779
  • 1
    Imagine a scenario where you have a Python script running on someone's computer. A hacker sends that person a file saying, "Hey, this is the latest data file. Please open." They may never have access to the source at all and can still cause problems. – Jordan Reiter Jul 20 '11 at 15:07
  • @Jordan Reiter: Imagine a scenario where you have a Python script running on someone's computer. A hacker sends that person a file saying, "Hey, this is the latest version of the application. Please install." They can still cause problems without wasting time hacking a pickled file. – S.Lott Jul 20 '11 at 15:09
  • Heck, in that scenario it hardly even matters whether you have the source or not. I'd also argue that while it's easier for a hacker to just change the source, it's probably easier and more likely to get an end user to double click on a file than install a new app. Also, I'm pretty sure all the hacker needs to know is that you're opening pickled files in order to inject code; they don't need to know the source at all. – Jordan Reiter Jul 21 '11 at 17:14
  • @Jordan Reiter: I'm pretty sure all the hacker needs to know is that you're running Python and are careless about kind of files you double-click on; they don't need to know how to hack a pickle file at all. – S.Lott Jul 21 '11 at 17:18
  • @S.Lott: I disagree. If you can upload a file on a website based upon python, if it is possible to execute the file for whatever technical reason, you are in a bad place. Ok python is open and having the source code is enough, as it will be with php and ruby, but when you have a website, the source code is not visible, and "pickle injection" can work. (mostly on ad-hoc web pages in pure python, but still.) – edomaur Jan 18 '12 at 07:57
  • 1
    @edomaur: Yes. Pickle injection might work. But it's not a *serious* threat. A serious threat is bad security practices. There's no **upload** of pickle files. The scenario of a web app saving private data in a pickle file is not a security opportunity. The pickle file saved -- internally -- by a web app is **less** of a threat than someone just hacking the code itself. – S.Lott Jan 18 '12 at 10:54
1

You might enjoy working with the y_serial module over at http://yserial.sourceforge.net

which reads like a tutorial but operationally offers working code for serialization and persistance. The commentary discusses some of the pros and cons relevant to issues raised here.

It's designed to be a general solution to warehousing compressed Python objects with SQLite (with almost no SQL fuss ;-)

Hope this helps.