Numpy converts scientific notation to nan while reading CSV

Question

I encountered a problem while reading a CSV file with np.genfromtxt. All the records in CSV are in scientific notation, yet whilst reading the file with np.genfromtxt every item in the array is 'nan'.

An example row from CSV: 1.02E+02;1.64E+00

In [1]: read = np.genfromtxt('13G-mapa-0001.CSV', delimiter=';')
In [2]: read
Out[2]:
array([[nan, nan],
   [nan, nan],
   [nan, nan],
   ...,
   [nan, nan],
   [nan, nan],
   [nan, nan]])

Full file:

1,204619e+002;1,639486e+000 
1,214262e+002;1,623145e+000 
1,223904e+002;1,607553e+000 
1,233547e+002;1,592153e+000 
1,243189e+002;1,576472e+000 
1,252832e+002;1,560220e+000 
1,262474e+002;1,543355e+000 
1,272117e+002;1,526069e+000 
1,281759e+002;1,508706e+000 
1,291402e+002;1,491635e+000 
1,301044e+002;1,475144e+000 
1,310686e+002;1,459387e+000 
1,320329e+002;1,444416e+000

Can't reproduce, using a csv with only your example row, it works (returns: `array([102. , 1.64])`) — sacuL, Sep 20 '18 at 16:45
please can you add the file to your question. This will help others in future. — Jon Scott, Sep 20 '18 at 16:51

Atul Shanbhag · Answer 1 · 2018-09-20T17:08:56.703

1

your delimiter must be a comma ',' not a semi-colon ';'

EDIT: The issue is there are commas too such as 1,25e+00 which needs to be parsed separately

def genfromtxt(file):
  from io import BytesIO
  with open(file, 'r') as f:
    lines = ' '.join([s.replace(',', '.') for s in f.readlines()])
  return np.genfromtxt(BytesIO(lines.encode('utf-8')), delimiter=';', dtype=np.float32)

This is my solution

edited Sep 20 '18 at 17:08

answered Sep 20 '18 at 16:48

Atul Shanbhag

636
5
13

I don't think that's true. https://stackoverflow.com/questions/10140999/csv-with-comma-or-semicolon – mypetlion Sep 20 '18 at 16:51
Yeah, I just noticed that, this brings me to another problem, is there a way to convert every coma to a dot in readings done by genfromtxt? – user10392573 Sep 20 '18 at 17:02

score 1 · Answer 2 · answered Sep 20 '18 at 17:09

Based on this answer, you can do the following to convert your comma decimal:

def conv(x):
    return x.replace(',', '.').encode()

read = np.genfromtxt((conv(x) for x in open("x.csv")), delimiter=';')

>>> read
array([[120.4619  ,   1.639486],
       [121.4262  ,   1.623145],
       [122.3904  ,   1.607553],
       [123.3547  ,   1.592153],
       [124.3189  ,   1.576472],
       [125.2832  ,   1.56022 ],
       [126.2474  ,   1.543355],
       [127.2117  ,   1.526069],
       [128.1759  ,   1.508706],
       [129.1402  ,   1.491635],
       [130.1044  ,   1.475144],
       [131.0686  ,   1.459387],
       [132.0329  ,   1.444416]])

score 1 · Answer 3 · answered Sep 20 '18 at 18:34

A modern, fast and versatile way to do that is provided by pandas :

import pandas as pd
table=pd.read_csv('data.csv',sep=';',decimal=',',header=None)
arr=table.values

for

array([[ 120.4619  ,    1.639486],
       [ 121.4262  ,    1.623145],
       [ 122.3904  ,    1.607553],
       [ 123.3547  ,    1.592153],
       [ 124.3189  ,    1.576472],
       [ 125.2832  ,    1.56022 ],
       [ 126.2474  ,    1.543355],
       [ 127.2117  ,    1.526069],
       [ 128.1759  ,    1.508706],
       [ 129.1402  ,    1.491635],
       [ 130.1044  ,    1.475144],
       [ 131.0686  ,    1.459387],
       [ 132.0329  ,    1.444416]])

read_csv offers more high level options than genfromtxt.

Numpy converts scientific notation to nan while reading CSV

3 Answers3