2

This is similar to How to convert an array of strings to an array of floats in numpy.

I have a list of strings:

dat = [
    '  1  2  1.040000e+005  0.030000\n',
    '  2  7  0.000000e+000  0.030000\n',
    '  3  15  0.000000e+000  0.030000\n',
]

Here are my failed attempts to make a numpy record array:

import numpy as np
dat_dtype = [
    ('I', 'i'),
    ('J', 'i'),
    ('val1', 'd'),
    ('val2', 'd'),
]

# Attempt 1
np.array(dat, dat_dtype)
# looks like garbage

# Attempt 2
np.array([x.split() for x in dat], dtype=dat_dtype)
# looks like different garbage

# Attempt 3
string_ndarray = np.array([x.split() for x in dat], dtype='|S15')
# looks good so far
string_ndarray.astype(dat_dtype)
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '1.040000e+005'

I give up. Here's the only way I can get the expected output:

dat_ndarray = np.zeros(len(dat), dat_dtype)
for i, line in enumarate(dat):
    dat_ndarray[i] = tuple(line.split())

print(dat_ndarray)  # [(1, 2, 104000.0, 0.03) (2, 7, 0.0, 0.03) (3, 15, 0.0, 0.03)]

Is there a more direct method to get the expected record array?

Community
  • 1
  • 1
Mike T
  • 41,085
  • 18
  • 152
  • 203

2 Answers2

1

Your input is lines of text, so you can use a text reader to convert it to an array (structured or plain). Here's one way to do that with numpy.genfromtxt:

np.genfromtxt(dat, dtype=dat_dtype)

For example,

In [204]: dat
Out[204]: 
['  1  2  1.040000e+005  0.030000\n',
 '  2  7  0.000000e+000  0.030000\n',
 '  3  15  0.000000e+000  0.030000\n']

In [205]: dat_dtype
Out[205]: [('I', 'i'), ('J', 'i'), ('val1', 'f'), ('val2', 'f')]

In [206]: np.genfromtxt(dat, dtype=dat_dtype)
Out[206]: 
array([(1, 2, 104000.0, 0.029999999329447746), (2, 7, 0.0, 0.029999999329447746), (3, 15, 0.0, 0.029999999329447746)], 
      dtype=[('I', '<i4'), ('J', '<i4'), ('val1', '<f4'), ('val2', '<f4')])
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214
  • This looks undocumented, as the first argument `fname` is neither a file or string. – Mike T Aug 26 '15 at 04:17
  • The documentation [is now corrected](https://github.com/numpy/numpy/commit/ae2d0bb7c2d227e893195cc3e52477567781e2db) to describe this feature; – Mike T Oct 06 '15 at 23:50
1

With your dat and dat_dtype this works:

In [667]: np.array([tuple(x.strip().split()) for x in dat],dtype=dat_dtype)
Out[667]: 
array([(1, 2, 104000.0, 0.03), (2, 7, 0.0, 0.03), (3, 15, 0.0, 0.03)], 
  dtype=[('I', '<i4'), ('J', '<i4'), ('val1', '<f8'), ('val2', '<f8')])

Structured arrays are best created with lists of tuples. I stripped off the \n, split it on white space, and then formed tuples

In [668]: [tuple(x.strip().split()) for x in dat]
Out[668]: 
[('1', '2', '1.040000e+005', '0.030000'),
 ('2', '7', '0.000000e+000', '0.030000'),
 ('3', '15', '0.000000e+000', '0.030000')]

I let the dat_dtype take care of the string to number conversion.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • This is the same as my attempt 2, but with `tuple(x.split())`. It doesn't seem that white space matters. – Mike T Aug 26 '15 at 04:19
  • Yes, the `.strip()` wasn't needed. It's just a habit from reading text lines - remove the `\n` before splitting into words. Without the `tuple` the best you get is an array of strings. – hpaulj Aug 26 '15 at 04:29