0

I have a pandas DataFrame from the sklearn.datasets Boston house price data and am trying to convert this to a numpy array but keeping column names. Here is the code I tried:

from sklearn import datasets ## imports datasets from scikit-learn
import numpy as np
import pandas as pd

data = datasets.load_boston() ## loads Boston dataset from datasets library

df = pd.DataFrame(data.data, columns=data.feature_names)
X = df.to_numpy()
print(X.dtype.names)

However this returns None and therefore column names are not kept. Does anyone understand why?

Thanks

geds133
  • 1,503
  • 5
  • 20
  • 52
  • 1
    why do you expect column names should be retained when you access an underlying arrays instead of a dataframe? You can store the column names as a dictionary/array if you want access to them later – anky May 05 '20 at 18:24
  • I assumed the code would create a structured array from the pandas DataFrame. I followed this answer to get there:https://stackoverflow.com/questions/7561017/get-the-column-names-of-a-python-numpy-ndarray – geds133 May 05 '20 at 18:25
  • @geds133 No, the corresponding method is `to_records`. `to_numpy` doesn't yield a structured array. – ayhan May 05 '20 at 18:28
  • I see, there is a question on Stack that suggests this is the case. I shall comment and ask for correction. Many Thanks – geds133 May 05 '20 at 18:30

1 Answers1

0

try this :

w = (data.feature_names).reshape(13,1)
X = np.vstack((w.T, data.data))
print (X)
K.J Fogang Fokoa
  • 209
  • 3
  • 13