2

I am trying to read a data file written by a Fortran program, in which every once in a while there is a very small float like 0.3299880-104. The error message is:

>np.loadtxt(filename, usecols = (1,))

  File "/home/anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.py", line 928, in loadtxt
    items = [conv(val) for (conv, val) in zip(converters, vals)]

  File "/home/anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.py", line 659, in floatconv
    return float(x)

ValueError: invalid literal for float(): 0.3299880-104

Can I do something to make Numpy able to read this data file anyway?

francescalus
  • 30,576
  • 16
  • 61
  • 96
Ohm
  • 2,312
  • 4
  • 36
  • 75
  • Are you sure the number you're trying to read is `0.3299880e-104` and not just `0.3299880-104` ? – Shark Dec 31 '15 at 10:37
  • 2
    Personally I'd use a utility (I like `sed`) to modify numbers from `0.3299880-104` to `0.3299880e-104`. I believe that Python itself can do that sort of thing, so you might want to write a routine to massage the file before reading it. – High Performance Mark Dec 31 '15 at 10:44
  • 1
    Possibly useful: http://stackoverflow.com/q/13274066. – francescalus Dec 31 '15 at 10:47
  • 1
    @Shark - the number I am trying to read is without the `e`, just - `0.3299880-104` – Ohm Dec 31 '15 at 11:07
  • @HighPerformanceMark - I don't think it is a duplicate of the question 'Read FORTRAN formatted numbers with Python', as I am asking if there is a way to do it automatically with NumPy – Ohm Dec 31 '15 at 11:10
  • 1
    I'm inclined to agree. It looks like you might develop a `converters` dictionary for `loadtxt` to handle this. I'd suggest you put a little work into figuring that out and then pose a new more specific question. (Or a new answer to the linked question if you do figure it out) – agentp Dec 31 '15 at 16:13

3 Answers3

3

As @agentp mentioned in the comments, one approach would be to use the converters= argument to np.genfromtxt to insert the e characters before casting to float:

import numpy as np

# some example strings
strings = "0.3299880-104 0.3299880+104 0.3299880"

# create a "dummy file" (see http://stackoverflow.com/a/11970414/1461210)
try:
    from StringIO import StringIO     # Python2
    f = StringIO(strings)
except ImportError:
    from io import BytesIO            # Python3
    f = BytesIO(strings.encode())

c = lambda s: float(s.decode().replace('+', 'e').replace('-', 'e-'))

data = np.genfromtxt(f, converters=dict(zip(range(3), [c]*3)))

print(repr(data))
# array([  3.29988000e-105,   3.29988000e+103,   3.29988000e-001])
ali_m
  • 71,714
  • 23
  • 223
  • 298
1

The accepted answer is helpful, but does not support negative values (-0.3299880 is converted to e-0.3299880) or 2-digit exponents (0.3299880E+10 is converted to 0.3299880Ee10), which both do not make sense and would result in nan values in the numpy array.

Also, the number of columns in the file to read is hard-coded (it is 3 in this case).

It can be addressed as follows:

import re
import numpy as np

def read_fortran_data_file(file):
    # count the columns in the first row of data
    number_columns = np.genfromtxt(file, max_rows=1).shape[0]

    c = lambda s: float(re.sub(r"(\d)([\+\-])(\d)", r"\1E\2\3", s.decode()))

    # actually load the content of our file
    data = np.genfromtxt(file,
        converters=dict(zip(range(number_columns), [c] * number_columns)),)

Testing

np.genfromtext accepts filenames or arrays of strings as input. For the demonstration I'll use the latter, but the above function works fine with filenames as input.

strings = [
    "0.3299880-104 0.3299880E+10 0.3299880 0.3299880+104 0.3299880E-10 -0.3299880"
]
read_fortran_data_file(strings)
## array([ 3.29988e-105,  3.29988e+009,  3.29988e-001,  3.29988e+103,
##         3.29988e-011, -3.29988e-001])

Note on NaN values:

When using np.genfromtxt, one must be careful with NaN values that would replace numbers that were not read properly, e.g. using the following assertion:

assert np.count_nonzero(np.isnan(data))==0, "data contains nan values"
arna
  • 153
  • 6
0

Not numpy, but I use the following regex and function:

import re

# convert d/D to e and add it if missing
fortregexp = re.compile(r'([\d.])[dD]?(((?<=[dD])[+-]?|[+-])\d)')
def fortran_float(num):
    num = fortregexp.sub(r'\1e\2', num)
    return float(num)

text = "0.3299880-104 0.3299880D+10 0.3299880 0.3299880+104 0.3299880E-10 -0.3299880"

nums = [fortran_float(i) for i in text.split()]

print(text)
print(nums)

which gives:

0.3299880-104 0.3299880D+10 0.3299880 0.3299880+104 0.3299880E-10 -0.3299880
[3.29988e-105, 3299880000.0, 0.329988, 3.29988e+103, 3.29988e-11, -0.329988]
Jellby
  • 2,360
  • 3
  • 27
  • 56