3

I have several data numeric files in which the decimal separator is a comma. So I use a lambda function to do a conversion:

import numpy as np
def decimal_converter(num_cols):
    conv = dict((col, lambda valstr: \
    float(valstr.decode('utf-8').replace(',', '.'))) for col in range(nb_cols))
    return conv

data = np.genfromtxt("file.csv", converters = decimal_converter(3))

the data in the file is like this:

0; 0,28321815;  0,5819178
1; 0,56868281;  0,85621369
2; 0,24022026;  0,53490058
3; 0,63641921;  0,0293904
4; 0,65585546;  0,55913776

Here with my function decimal_converter I need to specify the number of columns my file contains. Normally I don't need to specify numpy.genfromtxt the number of columns in the file and it takes all it finds. I would like to keep this feature even when using converters option.

askewchan
  • 45,161
  • 17
  • 118
  • 134
user1850133
  • 2,833
  • 9
  • 29
  • 39
  • There's a `NameError` with `nb_cols`, should be `num_cols` (Sorry, I can't edit two characters, meh!) – Ghanima Mar 16 '17 at 20:20

2 Answers2

7

Since genfromtxt() accepts an iterator, you can pass the iterator applying your conversion function and then you can avoid the converters parameter:

import numpy as np

def conv(x):
    return x.replace(',', '.').encode()

data = np.genfromtxt((conv(x) for x in open("test.txt")), delimiter=';')
Saullo G. P. Castro
  • 56,802
  • 26
  • 179
  • 234
  • numpy.genfromtxt can open gz or bz2 files; how can I add this feature using your solution? – user1850133 Jul 23 '14 at 10:08
  • @user1850133 you can use the same approach replacing `open()` by `gzip.open()`, [as explained in this thread](http://stackoverflow.com/a/10566609/832621) – Saullo G. P. Castro Jul 23 '14 at 11:00
  • 1
    instead, I created a z_open() function to open gz or bz2 files or just return the output of open() if the given file is uncompressed. – user1850133 Jul 23 '14 at 11:34
  • This seems not to be working (in Py3) as x is not a string anymore but a byte string and thus replace does not operate on it w/o decoding it first. – Ghanima Mar 16 '17 at 20:07
  • @Ghanima thank you for the update. I just edited the answer after testing in Python 3.5 – Saullo G. P. Castro Mar 16 '17 at 21:50
2

Using the pandas library might not be an option for you, but if it is, its function read_csv has a decimal argument that can be used to configure the decimal point character. For example,

In [36]: !cat file.ssv
    0; 0,28321815;  0,5819178
    1; 0,56868281;  0,85621369
    2; 0,24022026;  0,53490058
    3; 0,63641921;  0,0293904
    4; 0,65585546;  0,55913776

In [37]: import pandas as pd

In [38]: df = pd.read_csv("file.ssv", delimiter=';', decimal=',', header=None)

In [39]: df
Out[39]: 
   0         1         2
0  0  0.283218  0.581918
1  1  0.568683  0.856214
2  2  0.240220  0.534901
3  3  0.636419  0.029390
4  4  0.655855  0.559138

[5 rows x 3 columns]

You then have all that pandas goodness with which to manipulate this data. Or you could convert the data frame to a numpy array:

In [51]: df.as_matrix()
Out[51]: 
array([[ 0.        ,  0.28321815,  0.5819178 ],
       [ 1.        ,  0.56868281,  0.85621369],
       [ 2.        ,  0.24022026,  0.53490058],
       [ 3.        ,  0.63641921,  0.0293904 ],
       [ 4.        ,  0.65585546,  0.55913776]])
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214