1

Setup: I start with a non-square 2-dimensional list of values:

list = [ ['A','B','C'], np.array([1, 2, 3, 4]), np.array([0.5, 1.5]) ]

This list is then written to a file. Later, I need to extract this same list from the file to do work on.

Current method: I handle writing in an extremely simple manner: str() and f.write() commands. Part of this is that it was easy to set up; part of it is because the same file contains other, non-list objects which are being written (strings, dicts, ints, etc). I am running into trouble on the other end, however, when I load the file. The most direct approach gives

loadList = list(stringFromFile)
print(loadList)

> [ "[" "[" "'" "A" "'" "," "'" "B" "'" ...

and so on. Clearly, not what I am looking for. Adding in a splitter does a little better:

loadList = list(stringFromFile.split(','))
print(loadList)

> [ "[['A'" , "'B'" , "'C']" , "np.array([1", "2", "3", "4])", "np.array([0.5", "1.5"])]" ]

...but mishandles the subdivisions. Redefining the splitter (...split('],')) mishandles the array() elements, and so on.

Following this path, I can see a way to make it work with a significant number of if catches, a carefully refined splitter, and some special cases. However, it feels very clunky to do it in this manner. I also question whether it will be generalizable to any oddly-constructed 2-dimensional list-like that users might throw at it.

Is there a more elegant way to implement this? I am open to changing my write method, read method, or both, but I cannot change the handling of the list object itself without some pretty drastic redesign of the entire program.

Izzy
  • 179
  • 1
  • 13

4 Answers4

2

As far as I can tell, the problem is that you're overloading the purpose of your saved data. You're creating your own data store format; you want it to be human-readable, but then you also want it to be readily compatible to read back into a a variety of Python data structures. If you insist on mixing together unrestricted data in a single container (a text file), you're creating your own difficulties.

I won't go into the information theory conflicts; this is simply asking too much of simple text. Various modules have their own methods to write and read data; see numpy.savetxt for one such example. Python has a few add-on modules to handle built-in types, which you can find with appropriate searches. JSON works well as a common carrier for structured data.

The usual treatment is one of two possibilities:

  1. Pick a single package for your data representation and stick with it.
  2. Carefully write in your Pythonic fashion; read as text and apply the eval functions to recoup your earlier data. This way is extremely fragile.
Prune
  • 76,765
  • 14
  • 60
  • 81
  • Thank you for the response. Yes, I have almost certainly made this difficult for myself. Each of the elements I mentioned defines a part of a whole - they are the constructors for a rather intensive custom class I wrote, which is defined by a directory (string), file naming format (string), file load options (dict), valid elements of the file name (the list of this problem), and a few other items. I am avoiding eval() for security reasons. That said, I'm strongly considering sacrificing human-readability to use np.savetxt() as you suggest. – Izzy Oct 10 '19 at 17:37
  • `np.save` utterly sacrifices readability; `savetxt` retains enough for most purposes. I'm glad you're avoiding `literal_eval` and its friends. – Prune Oct 10 '19 at 17:54
1

Using str to write out the list is going to make things difficult. str is designed to produce human-readable strings, which may not be well-suited for machine parsing.

In general, this is an example of serialization, and it's probably easiest to use a library that will handle both "directions" (serializing an object to a file, and de-serializing from file contents) for you.

There are many approaches you could use. Here are two that would be simple to use from the Python standard library: Pickle, or JSON.

Pickle

>>> import numpy
>>> import pickle
>>> l = [ ['A','B','C'], numpy.array([1, 2, 3, 4]), numpy.array([0.5, 1.5]) ]
>>> l
[['A', 'B', 'C'], array([1, 2, 3, 4]), array([0.5, 1.5])]
>>>
>>> # Save the list to a file.
>>> with open('data.pkl', 'wb') as f:
...     pickle.dump(l, f)
... 
>>> # Load the list from a file.
>>> with open('data.pkl', 'rb') as f:
...     l_copy = pickle.load(f)
... 
>>> l_copy
[['A', 'B', 'C'], array([1, 2, 3, 4]), array([0.5, 1.5])]

JSON

The main caveat with JSON is that, when loading data, there would not be an easy way to load certain elements as numpy arrays, and other elements as Python lists. In other words, the distinction between types would not be preserved with a naive JSON serialization. See NumPy array is not JSON serializable

>>> import numpy
>>> import json
>>> 
>>> l = [ ['A','B','C'], numpy.array([1, 2, 3, 4]), numpy.array([0.5, 1.5]) ]
>>> def serialize_as_json(nested_list, filename):
...     # Need to convert numpy array to Python list of Python ints/floats.
...     l = [(elem.tolist() if isinstance(elem, numpy.ndarray) else elem) for elem in nested_list]
...     with open(filename, 'w') as f:
...         json.dump(l, f)
... 
>>> serialize_as_json(l, 'data.json')
>>> 
>>> l_copy = json.load(open('data.json'))
>>> # Note that l_copy contains lists, not numpy arrays.
>>> l_copy
[['A', 'B', 'C'], [1, 2, 3, 4], [0.5, 1.5]]
NicholasM
  • 4,557
  • 1
  • 20
  • 47
  • Lots of good information here, thank you. I have currently temporarily solved the problem with the crude method I mentioned near the end of my question - I have several fish to fry today - but I plan to revisit it down the line. I'm not sure yet which method I'll use. pickle seems viable, but I might be able to make numpy.savetxt() work as well, as suggested by @Prune. – Izzy Oct 10 '19 at 17:59
0

Assuming that your file is a document wich behaves like a simple .txt, and that the list you're interested in occupies the very first line (and the first only) of the aforementioned document, then what you want to do is the following:

import numpy as np
L = eval(open('list.txt').readlines()[0])

Where 'list.txt' is your file and L the variable where the script stores the list. It's not the best way but it's quick and simple, and unless your document is too fancy it will do the job.

Michele Bastione
  • 363
  • 3
  • 13
  • Thank you. Unfortunately the relevant information is not in the first line. As @Prune alluded, I've tried to keep my data storage method human-readable, in order to allow humans to create their own files from scratch if desired (in some cases, it may be easier - or at least, not harder - than going through my save function). Additionally, I would like to avoid eval(), if possible. – Izzy Oct 10 '19 at 17:34
0

I would suggest to work with json. The JSON format is a standard approach to handle objects to string or string to object operations. The only drawback is that it does not work with non-built-in python objects (e.g.: like numpy arrays). However, it is easily avoided by implementing the correct Json encoder and Json decoder. Here's an example:

import json
import numpy as np

class NumpyAwareJSONEncoder(json.JSONEncoder):
    """Overrides the default json encoder to allow the encoding of numpy arrays."""
     def default(self, obj):
         if isinstance(obj, np.ndarray):
             return {"__numpy__": True, "data": list(obj)}
         # else return default value
         return super().default(obj)

class NumpyAwareJSONDecoder(json.JSONDecoder):
     """Same thing as the encoder above but for the decode part."""
     def default(self, obj):
         obj = super().decode(obj)
         if not isinstance(obj, dict):
             return obj
         elif "__numpy__" not in obj:
             return obj
         if obj["__numpy__"]:
             return np.array(obj["data"])


to_convert = [some fancy list with non-built-in objects]
# write data to file
with open(path_to_file, "w") as f:
    json.dump(to_convert, f, cls=NumpyAwareJSONEncoder)
# now read
with open(path_to_file, "r") as f:
    data = json.load(f, cls=NumpyAwareJSONDecoder)

You can look at the official documentation here.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
fgoudra
  • 751
  • 8
  • 23