1

I encountered a problem while reading a CSV file with np.genfromtxt. All the records in CSV are in scientific notation, yet whilst reading the file with np.genfromtxt every item in the array is 'nan'.

An example row from CSV: 1.02E+02;1.64E+00

In [1]: read = np.genfromtxt('13G-mapa-0001.CSV', delimiter=';')
In [2]: read
Out[2]:
array([[nan, nan],
   [nan, nan],
   [nan, nan],
   ...,
   [nan, nan],
   [nan, nan],
   [nan, nan]])

Full file:

1,204619e+002;1,639486e+000 
1,214262e+002;1,623145e+000 
1,223904e+002;1,607553e+000 
1,233547e+002;1,592153e+000 
1,243189e+002;1,576472e+000 
1,252832e+002;1,560220e+000 
1,262474e+002;1,543355e+000 
1,272117e+002;1,526069e+000 
1,281759e+002;1,508706e+000 
1,291402e+002;1,491635e+000 
1,301044e+002;1,475144e+000 
1,310686e+002;1,459387e+000 
1,320329e+002;1,444416e+000
sacuL
  • 49,704
  • 8
  • 81
  • 106

3 Answers3

1

your delimiter must be a comma ',' not a semi-colon ';'

EDIT: The issue is there are commas too such as 1,25e+00 which needs to be parsed separately

def genfromtxt(file):
  from io import BytesIO
  with open(file, 'r') as f:
    lines = ' '.join([s.replace(',', '.') for s in f.readlines()])
  return np.genfromtxt(BytesIO(lines.encode('utf-8')), delimiter=';', dtype=np.float32)

This is my solution

Atul Shanbhag
  • 636
  • 5
  • 13
  • I don't think that's true. https://stackoverflow.com/questions/10140999/csv-with-comma-or-semicolon – mypetlion Sep 20 '18 at 16:51
  • Yeah, I just noticed that, this brings me to another problem, is there a way to convert every coma to a dot in readings done by genfromtxt? – user10392573 Sep 20 '18 at 17:02
1

Based on this answer, you can do the following to convert your comma decimal:

def conv(x):
    return x.replace(',', '.').encode()

read = np.genfromtxt((conv(x) for x in open("x.csv")), delimiter=';')

>>> read
array([[120.4619  ,   1.639486],
       [121.4262  ,   1.623145],
       [122.3904  ,   1.607553],
       [123.3547  ,   1.592153],
       [124.3189  ,   1.576472],
       [125.2832  ,   1.56022 ],
       [126.2474  ,   1.543355],
       [127.2117  ,   1.526069],
       [128.1759  ,   1.508706],
       [129.1402  ,   1.491635],
       [130.1044  ,   1.475144],
       [131.0686  ,   1.459387],
       [132.0329  ,   1.444416]])
sacuL
  • 49,704
  • 8
  • 81
  • 106
1

A modern, fast and versatile way to do that is provided by pandas :

import pandas as pd
table=pd.read_csv('data.csv',sep=';',decimal=',',header=None)
arr=table.values

for

array([[ 120.4619  ,    1.639486],
       [ 121.4262  ,    1.623145],
       [ 122.3904  ,    1.607553],
       [ 123.3547  ,    1.592153],
       [ 124.3189  ,    1.576472],
       [ 125.2832  ,    1.56022 ],
       [ 126.2474  ,    1.543355],
       [ 127.2117  ,    1.526069],
       [ 128.1759  ,    1.508706],
       [ 129.1402  ,    1.491635],
       [ 130.1044  ,    1.475144],
       [ 131.0686  ,    1.459387],
       [ 132.0329  ,    1.444416]])

read_csv offers more high level options than genfromtxt.

B. M.
  • 18,243
  • 2
  • 35
  • 54