The statistical software Stata allows short text snippets to be saved within a dataset. This is accomplished either using notes
and/or characteristics
.
This is a feature of great value to me as it allows me to save a variety of information, ranging from reminders and to-do lists to information about how I generated the data, or even what the estimation method for a particular variable was.
I am now trying to come up with a similar functionality in Python 3.6. So far, I have looked online and consulted a number of posts, which however do not exactly address what I want to do.
A few reference posts include:
What is the difference between save a pandas dataframe to pickle and to csv?
What is the fastest way to upload a big csv file in notebook to work with python pandas?
For a small NumPy
array, I have concluded that a combination of the function numpy.savez()
and a dictionary
can store adequately all relevant information in a single file.
For example:
a = np.array([[2,4],[6,8],[10,12]])
d = {"first": 1, "second": "two", "third": 3}
np.savez(whatever_name.npz, a=a, d=d)
data = np.load(whatever_name.npz)
arr = data['a']
dic = data['d'].tolist()
However, the question remains:
Are there better ways to potentially incorporate other pieces of information in a file containing a NumPy
array or a (large) Pandas
DataFrame
?
I am particularly interested in hearing about the particular pros and cons of any suggestions you may have with examples. The fewer dependencies, the better.