automatic detection/conversion of data types?

Question

Is there a function in numpy that determines whether strings should be integers or floating point numbers and automatically converts them? For instance, I often have a collection of records which are parsed from a text file using a combination of str.strip() and str.split(). Then I get something like

List = [['1','a','.3'],
        ['2','b','-.5']]

Which is then converted using numpy.rec.fromrecords:

In [1227]: numpy.rec.fromrecords(List)
Out[1227]: 
rec.array([('1', 'a', '.3'), ('2', 'b', '-.5')], 
      dtype=[('f0', '|S1'), ('f1', '|S1'), ('f2', '|S3')])

In R, there is a function called type.convert to which vectors/columns of character strings are passed and it will determine what the type for the column should be (i.e. if it's a mix of strings and numbers it will remain a character vector). Excel does this also (based on its first 6 elements, if I recall correctly)...

Is there such a function in NumPy/Python? I know I could probably write a function to test whether each element of a column could be converted to an integer, etc., but is there anything built in? I know in all the examples the prescription is to specify the dtypes explicitly, but I would like to skip this step. Thanks.

Related: https://stackoverflow.com/questions/6824862/data-type-recognition-guessing-of-csv-data-in-python — Anton Tarasenko, Jun 29 '18 at 15:42

score 5 · Accepted Answer · edited May 23 '17 at 10:29

5

numpy.genfromtxt can guess dtypes if you set dtype=None:

import numpy as np
import io

alist = [['1','a','.3'],
        ['2','b','-.5']]

f = io.BytesIO('\n'.join(' '.join(row) for row in alist))
arr = np.genfromtxt(f,dtype=None)
print(arr)
print(arr.dtype)
# [(1, 'a', 0.3) (2, 'b', -0.5)]
# [('f0', '<i4'), ('f1', '|S1'), ('f2', '<f8')]

Note that it would be better to apply np.genfromtxt directly to your text file instead of creating the intermediate list List (or what I called alist). If you need to do some processing of the file before sending it to np.genfromtxt, you could make a file-like object wrapper around the file which can do the processing and be passed to np.genfromtxt.

edited May 23 '17 at 10:29

Community

1
1

answered Nov 05 '11 at 12:56

unutbu

842,883
184
1,785
1,677

1

That's a very interesting solution! Seems a bit indirect... but maybe this is still the best way... – hatmatrix Nov 06 '11 at 00:58
Actually the object-wrapper concept is quite helpful, as with the `io.BytesIO` trick. I looked at the source code to extract the part which does the conversion but it seems like it is not so straightforward as it is not its own modular component within `np.genfromtxt`. This seems best. – hatmatrix Nov 10 '11 at 17:55

automatic detection/conversion of data types?

1 Answers1

Linked