numpy genfromtxt converters unknown number of columns

Question

I have several data numeric files in which the decimal separator is a comma. So I use a lambda function to do a conversion:

import numpy as np
def decimal_converter(num_cols):
    conv = dict((col, lambda valstr: \
    float(valstr.decode('utf-8').replace(',', '.'))) for col in range(nb_cols))
    return conv

data = np.genfromtxt("file.csv", converters = decimal_converter(3))

the data in the file is like this:

0; 0,28321815;  0,5819178
1; 0,56868281;  0,85621369
2; 0,24022026;  0,53490058
3; 0,63641921;  0,0293904
4; 0,65585546;  0,55913776

Here with my function decimal_converter I need to specify the number of columns my file contains. Normally I don't need to specify numpy.genfromtxt the number of columns in the file and it takes all it finds. I would like to keep this feature even when using converters option.

There's a `NameError` with `nb_cols`, should be `num_cols` (Sorry, I can't edit two characters, meh!) — Ghanima, Mar 16 '17 at 20:20

Saullo G. P. Castro · Accepted Answer · 2017-03-16T21:50:19.090

7

Since genfromtxt() accepts an iterator, you can pass the iterator applying your conversion function and then you can avoid the converters parameter:

import numpy as np

def conv(x):
    return x.replace(',', '.').encode()

data = np.genfromtxt((conv(x) for x in open("test.txt")), delimiter=';')

edited Mar 16 '17 at 21:50

answered Apr 09 '14 at 19:23

Saullo G. P. Castro

56,802
26
179
234

numpy.genfromtxt can open gz or bz2 files; how can I add this feature using your solution? – user1850133 Jul 23 '14 at 10:08
@user1850133 you can use the same approach replacing `open()` by `gzip.open()`, [as explained in this thread](http://stackoverflow.com/a/10566609/832621) – Saullo G. P. Castro Jul 23 '14 at 11:00
1

instead, I created a z_open() function to open gz or bz2 files or just return the output of open() if the given file is uncompressed. – user1850133 Jul 23 '14 at 11:34
This seems not to be working (in Py3) as x is not a string anymore but a byte string and thus replace does not operate on it w/o decoding it first. – Ghanima Mar 16 '17 at 20:07
@Ghanima thank you for the update. I just edited the answer after testing in Python 3.5 – Saullo G. P. Castro Mar 16 '17 at 21:50

Warren Weckesser · Answer 2 · 2014-04-09T19:38:44.440

Using the pandas library might not be an option for you, but if it is, its function read_csv has a decimal argument that can be used to configure the decimal point character. For example,

In [36]: !cat file.ssv
    0; 0,28321815;  0,5819178
    1; 0,56868281;  0,85621369
    2; 0,24022026;  0,53490058
    3; 0,63641921;  0,0293904
    4; 0,65585546;  0,55913776

In [37]: import pandas as pd

In [38]: df = pd.read_csv("file.ssv", delimiter=';', decimal=',', header=None)

In [39]: df
Out[39]: 
   0         1         2
0  0  0.283218  0.581918
1  1  0.568683  0.856214
2  2  0.240220  0.534901
3  3  0.636419  0.029390
4  4  0.655855  0.559138

[5 rows x 3 columns]

You then have all that pandas goodness with which to manipulate this data. Or you could convert the data frame to a numpy array:

In [51]: df.as_matrix()
Out[51]: 
array([[ 0.        ,  0.28321815,  0.5819178 ],
       [ 1.        ,  0.56868281,  0.85621369],
       [ 2.        ,  0.24022026,  0.53490058],
       [ 3.        ,  0.63641921,  0.0293904 ],
       [ 4.        ,  0.65585546,  0.55913776]])

looks interesting but I need to install it. – user1850133 Apr 10 '14 at 09:05 — user1850133, Apr 10 '14 at 09:05

numpy genfromtxt converters unknown number of columns

2 Answers2

Linked