Saving a multi-dimensional list in a file Python

Question

I have this list bhs created this way:

#Alotting Black Holes at z=6
bhs=[0]*1000

for i in tqdm(range(0,1000),position=0, leave=True):
    if len(mass_array1[i])!=0:
        bhs[i]=np.zeros(len(mass_array1[i]))
    else:
        bhs[i]=np.zeros(1)
    for j in range (len(mass_array1[i])):
        bhs[i][j]=np.random.lognormal(np.log(MbhthShimasaku(mass_array1[i],6)[j]),np.log(5))

I need to save the result in a text file. I have tried numpy.savetxt, pickle.dump and open():

open()

with open("bhs.txt", 'w') as file:
        for row in bhs:
            s = " ".join(map(str, row))
            file.write(s+'\n')

#Result .txt file:
0.0
0.0
0.0
0.0
1937651.7861915156 246221.20328840986 226756.87389065413
0.0
0.0

numpy.savetxt()

bhs=np.array(bhs)
np.savetxt('bhs.txt',bhs,fmt='%s')

#Result .txt file:
[0.]
[0.]
[0.]
[0.]
[26447480.89508711  1097038.92200952   971383.67441455]
[0.]
[0.]
[0.]
[0.]
[0.]

pickle

bhs.append(bhs)

tupleA=tuple(bhs)

filename = 'bhs.p'
with open(filename, 'wb') as filehandler:
    pickle.dump(tupleA, filehandler)

#Result .p file
array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([1937651.78619152,  246221.20328841,  226756.87389065])

I am unable to get back the original array/list from all these saved files. When I try to use any of these loaded lists, I get some kind of error:

np.loadtxt

could not convert string to float: '[0.]'

open()

my_file = open("bhs.txt", "r")
content = my_file.read()
content_list = content.split(",")
my_file.close()
print(content_list)

[0.]\n[0.]\n[26447480.89508711  1097038.92200952   971383.67441455]\n[0.]\n[0.]\n[0.]\n[0.]\n[0.]\n[0.]\n[0.]\n[0.]\n

Sample of bhs as a list

array([1461403.98258597]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([0.]), array([26447480.89508711,  1097038.92200952,   971383.67441455]),

How can I say my multidimensional list so that I can get back exactly what I started with?

Extra: mass_array1 file

https://drive.google.com/file/d/1Kdmv1fcbDelEzGmi4BOE4HjUbM7Cg23b/view?usp=sharing

And this is how I import it into python:

You need to unzip the file into a folder first.

dirlist=["bh2e10"]
import time

mass_array1=[0]*1000
#print(mass_array)
#read all the files 
for i,X in enumerate(dirlist):
    exec('filelist=glob.glob("%s/test*.dat")'%(X))
    #exec("mass_array%s=[]"%X)
    initial_mass=[]
    for j,Y in tqdm(enumerate(filelist),position=0, leave=True, total=1000):
        Y=Y.replace(os.sep, '/')
        #Z=int(Y[10:13])
        Z=int(re.findall("\d+", Y)[2])
        #print(Z)
        mass_array1[Z]=[]
        #print('i=',Z,end="\r")
        #print('i=',Z,end="\r")
        exec("initial_partial=np.loadtxt('%s',max_rows=1)"%(Y))
        exec("initial_mass=np.append(initial_mass,initial_partial)")
        exec("mass_partial=np.loadtxt('%s',skiprows=1)"%(Y))
        mass_array1[Z]=np.append(mass_partial,mass_array1[Z])
        #mass_array1[Z]=mass_partial

Could you use [h5py](https://docs.h5py.org/en/stable/quick.html)? — dshanahan, Jun 01 '21 at 21:00
Can you update your post and paste a sample of bhs as a list please. — Corralien, Jun 01 '21 at 21:04
@dshanahan I have never used h5py but I will look into it. Thanks for the suggestion. — Aryan Bansal, Jun 01 '21 at 21:23
@TimRoberts Thanks. I tried JSON as well, but got some errors there as well when trying to load it. I will upload the mass_array1 file on my Gdrive and link it here so that you can give it a try. — Aryan Bansal, Jun 01 '21 at 21:26
If you want to use the file in same environment. You can use the pickle from Scikit-learn package to dump the obj in the pickel and load the file when you need it again. You can google *"Pickled model as a file using joblib** — quickhaze, Jun 01 '21 at 21:28
Why why why are you using `exec`? It's very dangerous and totally unnecessary here. — Tim Roberts, Jun 01 '21 at 21:30
@ultron Yes, pickle was the most successful amongst all the methods I tried but it was a little inconvenient to use and creates [[[]]] triple brackett list. So each time to get my answer I need to do bhs[0][0] to get an element from the list. Idk if I am using pickle wrong or it is supposed to look like this. But if nothing else works, I will have to settle with Pickle. Thank again for the suggestion. — Aryan Bansal, Jun 01 '21 at 21:32
@TimRoberts hahaha ya that's true. I probably don't need it. I'll fix it. Thanks — Aryan Bansal, Jun 01 '21 at 21:33

score 2 · Accepted Answer · edited Jun 02 '21 at 03:49

2

Use csv module

import numpy as np
import csv

bhs = [[0.], [0.], [0.], [0.], [26447480.89508711, 1097038.92200952, 971383.67441455], [0.], [0.], [0.], [0.], [0.]]

# write to csv
with open("bhs.txt", mode="w", newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerows(bhs)

# read from csv
with open("bhs.txt", mode="r") as csvfile:
    reader = csv.reader(csvfile)
    bhs1 = [np.array(row, dtype=np.float).tolist() for row in reader]

>>> bhs == bhs1
True

Update: use joblib

import joblib

bhs = [[0.], [0.], [0.], [0.], [26447480.89508711, 1097038.92200952, 971383.67441455], [0.], [0.], [0.], [0.], [0.]]

joblib.dump(bhs, "bhs.txt")

bhs1 = joblib.load("bhs.txt")

>>> bhs == bhs1
True

edited Jun 02 '21 at 03:49

Tim Roberts

48,973
4
21
30

answered Jun 01 '21 at 21:14

Corralien

109,409
8
28
52

How are you going to restore that? Clearly, the OP has been able to SAVE the data easily. – Tim Roberts Jun 01 '21 at 21:16
@TimRoberts. Is it simple enough? – Corralien Jun 01 '21 at 21:22
There is still a minor problem. On Windows, this results in an extra blank line between each line. You need `newline=''` when opening the file. https://stackoverflow.com/questions/3191528/csv-in-python-adding-an-extra-carriage-return-on-windows – Tim Roberts Jun 01 '21 at 21:28
@TimRoberts. Why Whyndows :-) Can you try my new answer on Windows, please? – Corralien Jun 01 '21 at 21:31
Feel free to edit my answer. I guess I need to append `lineterminator='\n'` to `csv.writer(csvfile)`, right? But I can't test if it works. – Corralien Jun 01 '21 at 21:35
@Corralien Jesus!!! That Joblib module saved my life. Thanks a lott. I will most likely make this answer 'Accepted' answer once I am sure it works with all other functions in my code. But thanks again. – Aryan Bansal Jun 01 '21 at 22:10
@Corralien Is it possible if it can work even with empty array. I mean if bhs[i]=[] instead of [0.]? Originally, my bhs list had empty elements [] which I changed to [0] for the sake of making this question easier to ask. Even if its not possible, it is fine. You still saved my life :D. – Aryan Bansal Jun 01 '21 at 22:13
It doesn't matter if your list is empty, it remains a serializable Python object, so it works. I think remove the csv part of my answer. I hope you will accept my answer :-) – Corralien Jun 01 '21 at 22:27
Thanks again. It worked perfectly. I accepted this answer :D – Aryan Bansal Jun 02 '21 at 08:09
Great. You upvoted for the question, not accepted ;-) – Corralien Jun 02 '21 at 08:31

hpaulj · Answer 2 · 2021-06-01T23:16:08.293

First, understand what you created:

In [94]: bhs = [0]*5
In [95]: bhs[1]=np.random.rand(4)*1000
In [96]: bhs
Out[96]: [0, array([900.04634682,  67.58574156, 364.69588687, 868.10145473]), 0, 0, 0]

It's a list, with mostly 0s, and one or more 1d arrays.

The csv file format is intended for a "table", many rows all with the same number of columns.

savetxt writes an array, preferably 2d, but it can work with 1d. But you gave it a list. So it had to make an array first:

In [98]: np.array(bhs)
<ipython-input-98-fe2575327968>:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  np.array(bhs)
Out[98]: 
array([0, array([900.04634682,  67.58574156, 364.69588687, 868.10145473]),
       0, 0, 0], dtype=object)

The result of saving that with %s is:

In [99]: cat bhs.txt
0
[900.04634682  67.58574156 364.69588687 868.10145473]
0
0
0

That array element was been written as the str display. Such a file is hard, though not impossible, to load with a csv tool. It is not a proper csv file.

pickle can handle almost any python object, including a list of various stuff:

In [102]: with open('bhs.p','wb') as f:
     ...:     pickle.dump(bhs, f)
     ...: 
In [105]: with open('bhs.p','rb') as f:
     ...:     new=pickle.load(f)
     ...: 
     ...: 
In [106]: new
Out[106]: [0, array([900.04634682,  67.58574156, 364.69588687, 868.10145473]), 0, 0, 0]

The array version of the list in Out[98] can also be saved as an array (with embedded pickling):

In [110]: np.save('foo.npy',_98)
In [111]: np.load('foo.npy', allow_pickle=True)
Out[111]: 
array([0, array([900.04634682,  67.58574156, 364.69588687, 868.10145473]),
       0, 0, 0], dtype=object)

I question whether you really want or should be creating a list of arrays like this. In any, case make sure you understand what you've created before trying to save it randomly selected formats.

score 0 · Answer 3 · answered Jun 01 '21 at 21:13

0

You have better to save it in a .csv (comma separe file so you can easly upload or take it.

answered Jun 01 '21 at 21:13

Roberto Cherchi

1
1

That's not an answer. That's a suggestion and belongs in comments. The variable number of columns will still be an issue. – Tim Roberts Jun 01 '21 at 21:15

Saving a multi-dimensional list in a file Python

3 Answers3