-1

How to read file with mixed data type into a numpy array in Python?

I'm a new python learner. I'm trying to read an existing file with mixed data type into a numpy array.

The content of file data.txt (if comma is not a good symbol, it can be replaced by space):

   ,'A','B','C','D'
'A',  0,  3,  5, -1
'B',  3,  0,  1,  6
'C',  5,  1,  0,  2
'D', -1,  6,  2,  0

The expected output numpy array is as follows:

array([[None,'A','B','C','D'],
       ['A',  0,  3,  5, -1 ],
       ['B',  3,  0,  1,  6 ],
       ['C',  5,  1,  0,  2 ],
       ['D', -1,  6,  2,  0 ]])
Haven Shi
  • 457
  • 5
  • 14
  • 19
  • 1
    Possible duplicate of [How to read csv into record array in numpy?](https://stackoverflow.com/questions/3518778/how-to-read-csv-into-record-array-in-numpy) – Vinícius Figueiredo Jul 15 '17 at 19:33
  • 1
    There's no possible way to get exactly your expected output with a plain numpy array (at least not if that's a 2D array). However you could try to read it into a `pandas.DataFrame`. – MSeifert Jul 15 '17 at 19:35

1 Answers1

2

You could use pandas.read_csv:

>>> import pandas as pd

>>> df = pd.read_csv('data.txt', index_col=0, sep=',')
>>> print(df)
     'A'  'B'  'C'  'D'

'A'    0    3    5   -1
'B'    3    0    1    6
'C'    5    1    0    2
'D'   -1    6    2    0

You can then access the underlying array with .values:

>>> df.values
array([[ 0,  3,  5, -1],
       [ 3,  0,  1,  6],
       [ 5,  1,  0,  2],
       [-1,  6,  2,  0]], dtype=int64)

At least to my knowledge it's not possible to read that file into a plain (not-object) 2D array because a record array requires that any column follows the same types. While it could work for the second-last row (str, int, int, int, int) it couldn't work for the first row (NoneType, str, str, str, str). At least with pandas you can interpret the first row and first column as indices which can have a different type.

However if you don't need the first row and column you could use np.loadtxt:

>>> import numpy as np

>>> np.loadtxt('myfile.txt', delimiter=',', skiprows=1, usecols=[1,2,3,4], dtype=int)
array([[ 0,  3,  5, -1],
       [ 3,  0,  1,  6],
       [ 5,  1,  0,  2],
       [-1,  6,  2,  0]])
MSeifert
  • 145,886
  • 38
  • 333
  • 352