0

I am sorry because this probably has an answer somewhere and I am just searching for the wrong stuff but here is my problem:

I have a pandas dataframe called data and I want to create a decision tree using a module provided by my lecturer. This library comes with an example, where they use data.columns to somehow get an index object that contains only the column names. They then proceed to use slicing to select only the descriptive features. Now my problem is that for my homework, I also need to select descriptive features, but I can't use slicing as I want to access some random columns. I tried to select the columns like this:

desc_columns = columns['workclass','education', 'marital-status', 'occupation', 'relationship', 'race', 'sex', 'native-country']

Which says that only integers are accepted as an index. Then I tried to access it like this:

desc_columns = columns[1,2,4,5,6,7,8,10]

Which gives me IndexError: too many indices for array. Can someone tell me what this index thing is and how I can select arbitrary elements in it?

Gasp0de
  • 1,199
  • 2
  • 12
  • 30
  • What is `print (data.columns)` ? – jezrael Dec 08 '18 at 16:08
  • 1
    Just use `data = data[['workclass','education', 'marital-status', 'occupation', 'relationship', 'race', 'sex', 'native-country']]`. – jpp Dec 08 '18 at 16:08
  • print (data.columns) gives me `Index(['age', 'workclass', 'education', 'education-num', 'marital-status', 'occupation', 'relationship', 'race', 'sex', 'hours-per-week', 'native-country', 'label'], dtype='object')`. Now I need an Index object that contains only some of those column labels – Gasp0de Dec 08 '18 at 16:13
  • @jpp I don't want the data, I need this Index object that contains the column names which is returned by data.columns. The problem is that I need only some of those columns. However, as a workaround I could probably do what you suggested and then do data.columns again! I'll try that now. – Gasp0de Dec 08 '18 at 16:15
  • @Gasp0de, Is there a reason you *need* a `pd.Index` object as opposed to the input list? Pandas usually works equally with `list` and there's no necessity for `pd.Index` for most use cases. – jpp Dec 08 '18 at 16:16
  • @jpp I am using this decision tree module that is provided by my lecturer. They are using this pd.Index, I will try and see if it also works with a list, thank you! – Gasp0de Dec 08 '18 at 16:19

0 Answers0