I have been trying to convert a pandas dataframe into a numpy array, carrying over the dtypes and header names for ease of reference. I need to do this as the processing on pandas is WAY too slow, numpy is 10 fold quicker. I have this code from SO that gives me what I need apart from that the result does not look like a standard numpy array - i.e. it does not show the columns numbers in the shape.
[In]:
df = pd.DataFrame(randn(10,3),columns=['Acol','Ccol','Bcol'])
arr_ip = [tuple(i) for i in df.as_matrix()]
dtyp = np.dtype(list(zip(df.dtypes.index, df.dtypes)))
dfnp= np.array(arr_ip, dtype=dtyp)
print(dfnp.shape)
dfnp
[Out]:
(10,) #expecting (10,3)
array([(-1.0645345 , 0.34590193, 0.15063829),
( 1.5010928 , 0.63312454, 2.38309797),
(-0.10203999, -0.40589525, 0.63262773),
( 0.92725915, 1.07961763, 0.60425353),
( 0.18905164, -0.90602597, -0.27692396),
(-0.48671514, 0.14182815, -0.64240004),
( 0.05012859, -0.01969079, -0.74910076),
( 0.71681329, -0.38473052, -0.57692395),
( 0.60363249, -0.0169229 , -0.16330232),
( 0.04078263, 0.55943898, -0.05783683)],
dtype=[('Acol', '<f8'), ('Ccol', '<f8'), ('Bcol', '<f8')])
Am I missing something or is there another way of doing this? I have many df's to convert and their dtypes and column names vary so I need this automated approach. I also need it to be efficient due to the large number of df's.