2

I am trying to save an array consisting of both floats and one column of strings, and I am getting some really weird results. This is what I have tried:

data = np.column_stack((f1, f2, f3, s1))

The first column (f1) is long floats (Up to 10 digits, but I only need 2). I also need 2-3 digits on the second and third column, f2 and f3 respectively. The last column, s1 only consists of two different strings: 'FeI' and 'FeII'.

The problem is, that when I try to print data I get something like this:

[['7352'  '11.7'  '-4.9'  'FeI']
 ...,
 ['5340'  '22.8'  '-8.2'  'FeII']]

While I would like to get something like this (I don't care if it save the floats as strings, as I can easily load them as floats afterwards):

[['7352.91'  '11.78'  '-4.92'  'FeI']
 ...,
 ['53407.66'  '22.82'  '-8.27'  'FeII']]

As you can see, the main problem is, that it 53407.66 turns into 5340 - a magnitude off!

Possible solution To use np.array instead and use the dtype-option. However, I don't know how to store a column as strings. Any help?

3 Answers3

5

Use a structured array to hold the data, instead of using column_stack.

Suppose this is your data:

In [30]: f1
Out[30]: array([ 12.3,  45.6,  78.9])

In [31]: f2
Out[31]: array([ 10.11,  12.13,  14.15])

In [32]: f3
Out[32]: array([ 1. ,  2.5,  5. ])

In [33]: s1
Out[33]: 
array(['foo', 'bar', 'baz'], 
      dtype='|S3')

Here's how you can create a structured array. The first argument is a list of tuples. Each tuple holds the values for each structured element of the array. The dtype argument defines the data types of the fields in the structure. In this case, there are three floating point fields (named 'f1', 'f2' and 'f3'), and one field (named 's1') containing strings of at most 16 characters:

In [34]: data = np.array(zip(f1, f2, f3, s1), dtype=[('f1', float), ('f2', float), ('f3', float), ('s1', 'S16')])

In [35]: data
Out[35]: 
array([(12.3, 10.11, 1.0, 'foo'), (45.6, 12.13, 2.5, 'bar'),
       (78.9, 14.15, 5.0, 'baz')], 
      dtype=[('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8'), ('s1', 'S16')])

To control the format of the fields when this is saved with np.savetxt, you can give it a list of formats, one for each field:

In [36]: np.savetxt('output.txt', data, fmt=["%.3f",]*3 + ["%s"])

In [37]: !cat output.txt
12.300 10.110 1.000 foo
45.600 12.130 2.500 bar
78.900 14.150 5.000 baz

Note: Another option to consider is putting your data into a pandas DataFrame, and using its to_csv method.

Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214
1

The solution using zip should work for most cases, but i think it might not be the most efficient one. Also, I had a small problem when one of the arrays was of type np.dateTime64. Here is another solution using pandas...:

import pandas as pd
import numpy as np

f1 = np.array([ 12.3,  45.6,  78.9])
f2 = np.array([ 10.11,  12.13,  14.15])
f3 = np.array([ 1. ,  2.5,  5. ])
s1 = np.array(['foo', 'bar', 'baz'])
d1 = np.array(['2015-04-30T02:58:22.000+0200', '2015-04-30T02:58:22.000+0200',
       '2015-04-30T02:58:22.000+0200'], dtype='datetime64[ms]')
df = pd.DataFrame({
            'f1':f1,
            'f2':f2,
            'f3':f3,
            'str1':s1,
            'date':d1
})
df.to_csv('out.csv')
ntg
  • 12,950
  • 7
  • 74
  • 95
0

Why not pre-process the data you are storing? Like

f1 = ['{0:0.2f}'.format(str(item) for item in f1]
f2 = ['{0:0.3f}'.format(str(item) for item in f2]
f3 = ['{0:0.3f}'.format(str(item) for item in f3]

If you are planning on using them later, you should probably only represent the floats as strings when you are printing them out - not when you store them in your array.

Steinar Lima
  • 7,644
  • 2
  • 39
  • 40