How to save and read back a multidimensional string array (possibly) with numpy?

Question

I need to save data to a file where each line follows this format: <string1> <array of thousands of floats> <string2>. So I thought about concatenating the data into one huge string array, as below:

labels = ['label1', 'label2', 'label3']
values = [[0.1, 0.4, 0.5],
          [0.1, 0.2, 0.1],
          [0.5, 0.6, 1.0]]
descriptions = ['desc1', 'desc2', 'desc3']
concat1 = np.r_['1,2,0', labels, values]
concat2 = np.r_['1,2,0', concat1, descriptions]

Result:

[['label1' '0.1' '0.4' '0.5' 'desc1']
 ['label2' '0.1' '0.2' '0.1' 'desc2']
 ['label3' '0.5' '0.6' '1.0' 'desc3']]

I know that if each subarray were small enough I could do something like this:

np.savetxt('output.txt', concat2, fmt = "%s %s %s %s %s")

But my problem involves thousands of values, so it's kind of impractical to type the format one variable at a time.

Any other suggestion of how to save it to file?

PS: It sounds a bit weird to save floats as strings, but my superior asked it like this, so...

Have you tried [numpy.ndarray.tofile](https://docs.scipy.org/doc/numpy-1.12.0/reference/generated/numpy.ndarray.tofile.html)? — Brad Solomon, Jun 17 '17 at 23:44
`fmt="%s" should work. `savetxt` replicates that value to account for the number of values in the array `row`. That is, it constructs `newfmt = delimiter.join([fmt]*len(row))`. — hpaulj, Jun 18 '17 at 00:27
@BradSolomon It works but then I can't read back into a multidimensional array. It's saved as a single line. — plethora, Jun 18 '17 at 00:29
So you are talking about a csv file with thousands of columns. Also one that's a mix of string and float columns? How were you going to read it? With `genfromtxt` into a structured array? Save and read might be simpler if you saved the string data to one file, and floats to another. — hpaulj, Jun 18 '17 at 00:41
@hpaulj, that's what I believe too(saving to separate files would be better), but I was told to save it all together. I thought about reading it as strings and then parsing all the floats and separating them into the original arrays(name, value, desc). I tried ' fmt="%s" ', but when I read it from file the new array is with string literals(?), like: '[["b'label1'" "b'0.1'" "b'0.4'" "b'0.5'" "b'desc1'"] ["b'label2'" "b'0.1'" "b'0.2'" "b'0.1'" "b'desc2'"] ["b'label3'" "b'0.5'" "b'0.6'" "b'1.0'" "b'desc3'"]]'. Is there any problem in dealing with strings like this? — plethora, Jun 18 '17 at 01:13
@hpaulj btw, I used `np.loadtxt('output.txt', dtype=np.str)` to read the file. — plethora, Jun 18 '17 at 01:18
In PY3 the default string type is uncode; `b` marks a bytestring (the default for py2). `load/genfromtxt` load bytestrings as the default. — hpaulj, Jun 18 '17 at 01:27
https://stackoverflow.com/questions/36507283/shape-of-a-structured-array-in-numpy - a small example of saving a mix of string and numbers. — hpaulj, Jun 18 '17 at 01:30

Heiko Oberdiek · Answer 1 · 2017-06-17T23:23:31.093

A solution without numpy:

labels = ['label1', 'label2', 'label3']
values = [[0.1, 0.4, 0.5],
          [0.1, 0.2, 0.1],
          [0.5, 0.6, 1.0]]
descriptions = ['desc1', 'desc2', 'desc3']

with open('output.txt', 'w') as handle:
    for label, nums, description in zip(labels, values, descriptions):
        handle.write('{} {} {}\n'.format(
            label,
            ' '.join(map(str, nums)),
            description,
        ))

Contents of output.txt:

label1 0.1 0.4 0.5 desc1
label2 0.1 0.2 0.1 desc2
label3 0.5 0.6 1.0 desc3

Or starting from concat2:

with open('output.txt', 'w') as handle:
    for row in concat2:
        handle.write(' '.join(row))
        handle.write('\n')

How to save and read back a multidimensional string array (possibly) with numpy?

1 Answers1