8

I have used the following code to convert the sk learn breast cancer data set to data frame : I am not getting the output ? I am very new in python and not able to figure out what is wrong.

def answer_one(): 

    import numpy as np
    import pandas as pd
    from sklearn.datasets import load_breast_cancer 
    cancer = load_breast_cancer()     
    data = numpy.c_[cancer.data, cancer.target]
    columns = numpy.append(cancer.feature_names, ["target"])
    return pandas.DataFrame(data, columns=columns)

answer_one()
talonmies
  • 70,661
  • 34
  • 192
  • 269
solly bennet
  • 121
  • 1
  • 2
  • 7

4 Answers4

8

Use pandas

There was a great answer here: How to convert a Scikit-learn dataset to a Pandas dataset?

The keys in bunch object give you an idea about which data you want to make columns for.

df = pd.DataFrame(cancer.data, columns=cancer.feature_names)
df['target'] = pd.Series(cancer.target)
7

As of scikit-learn 0.23 you can do the following to get a DataFrame and save some keystrokes:

df = load_breast_cancer(as_frame=True)
df.frame
jeffhale
  • 3,759
  • 7
  • 40
  • 56
  • Not working for me for some reason in Google Colab. Colab has 0.22, but I upgraded to 0.24 using pip (and the __version__ shows the updated version), still using as_frame=True) still returns a bunch :-/ – Levon Feb 17 '21 at 00:06
  • Question: How did you know to use the attribute `.frame`? I'm new to Python and sklearn and trying to figure out how you knew to use `.frame`, as I don't see it mentioned in the help docs. – Desmond Sep 25 '22 at 12:28
  • @Desmond, see example here: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html "dataBunch Dictionary-like object, with the following attributes. data{ndarray, dataframe} of shape (150, 4) The data matrix. If as_frame=True, data will be a pandas DataFrame. ... frame: DataFrame of shape (150, 5) Only present when as_frame=True. DataFrame with data and target. – jeffhale Sep 26 '22 at 11:58
6

The following code works

def answer_one(): 
    import numpy as np
    import pandas as pd
    from sklearn.datasets import load_breast_cancer 
    cancer = load_breast_cancer()     
    data = np.c_[cancer.data, cancer.target]
    columns = np.append(cancer.feature_names, ["target"])
    return pd.DataFrame(data, columns=columns)

answer_one()

The reason why your code doesn't work before was you try to call numpy and pandas package again after defining it as np and pd respectively.

However, i suggest that the package loading and redefinition is done at the beginning of the script, outside a function definition.

  • def answer_one(): data = numpy.c_[cancer.data, cancer.target] columns = numpy.append(cancer.feature_names, ["target"]) return pandas.DataFrame(data, columns=columns) answer_one() – solly bennet Feb 13 '18 at 17:22
  • I tried without defining them. but does not get an output. Anything wrong with the return statement ? – solly bennet Feb 13 '18 at 17:23
3
dataframe = pd.DataFrame(data=cancer.data, columns=cancer.feature_names)
dataframe['target'] = cancer.target
return dataframe
4b0
  • 21,981
  • 30
  • 95
  • 142
Marckhz
  • 31
  • 1
  • 3
    Welcome to Stack Overflow! Code-only answers are not particularly helpful. Please include a brief description of how this code solves the problem. – 4b0 Apr 18 '20 at 20:07