I am importing a 153,673*25
csv data matrix with integers, floats and strings using pandas, through the IPython console in Anaconda's Spyder (Python 2). I then want to transform this data into a structured array, by specifying the column names through the pandaframe columns names and the types manually. Here is the code - functions importing_data.run()
and attributes_names.run()
respectively import the csv data in pandaframe format and extract the column names of the pandaframe as a list:
import pandas
import numpy
import importing_data
import attributes_names
csv_data = importing_data.run()
names = attributes_names.run(csv_data)
type_list = ['int',
'str',
'str',
...
'float',
'int',
'int',
]
data_type = zip(names,type_list)
n_rows = len(csv_data.ix[:,0])
n_columns = len(csv_data.ix[0,:])
data_sample = numpy.zeros((n_rows,n_columns),dtype=data_type)
for i in range(0,n_columns):
column = csv_data.ix[:,i].values
data_sample[:,i] = column
However, the final loop seems to be failing: it sometimes pushes the kernel to restart, and when it doesn't the data_sample
array has an unexpected structure; I can't precisely describe it as lately I've only have kernel restarts, but I believe it was a 153,673*25
dimensional array made up of 153,673
dimensional lists.
What am I doing wrong here?
Edit
A first mistake I was making is the following: instead of
data_sample = numpy.zeros((n_rows,n_columns),dtype=data_type)
I have to put:
data_sample = numpy.zeros((n_rows,1),dtype=data_type)
I have redefined the loop as follows:
for i in range(0,n_rows):
data_sample[i,0] = csv_data.values[i,:]
But now I get the following error message: TypeError: expected a single-segment buffer object