Change pandas DataFrame to numpy array but keeping column names

Question

I have a pandas DataFrame from the sklearn.datasets Boston house price data and am trying to convert this to a numpy array but keeping column names. Here is the code I tried:

from sklearn import datasets ## imports datasets from scikit-learn
import numpy as np
import pandas as pd

data = datasets.load_boston() ## loads Boston dataset from datasets library

df = pd.DataFrame(data.data, columns=data.feature_names)
X = df.to_numpy()
print(X.dtype.names)

However this returns None and therefore column names are not kept. Does anyone understand why?

Thanks

why do you expect column names should be retained when you access an underlying arrays instead of a dataframe? You can store the column names as a dictionary/array if you want access to them later — anky, May 05 '20 at 18:24
I assumed the code would create a structured array from the pandas DataFrame. I followed this answer to get there:https://stackoverflow.com/questions/7561017/get-the-column-names-of-a-python-numpy-ndarray — geds133, May 05 '20 at 18:25
@geds133 No, the corresponding method is `to_records`. `to_numpy` doesn't yield a structured array. — ayhan, May 05 '20 at 18:28
I see, there is a question on Stack that suggests this is the case. I shall comment and ask for correction. Many Thanks — geds133, May 05 '20 at 18:30

score 0 · Answer 1 · answered May 06 '20 at 13:12

0

try this :

w = (data.feature_names).reshape(13,1)
X = np.vstack((w.T, data.data))
print (X)

answered May 06 '20 at 13:12

K.J Fogang Fokoa

209
3
13

Change pandas DataFrame to numpy array but keeping column names

1 Answers1