0

I can't figure this bug out. I think it is my misunderstanding of a dataframe and indexing through one. Also, maybe a misunderstanding of a for loop. (I am used to matlab for loops... iterations are, intuitively, way easier :D)

Here is the error:

KeyError: "['United States' 'Canada' 'Mexico'] not found in axis"

This happens at the line: as_df=as_df.drop(as_df[column])

But this makes no sense... I am calling an individual column not the entire set of dummy variables.

The following code can be copied and ran. I made sure of it.

MY CODE:

import pandas as pd
import numpy as np
df=pd.DataFrame({"country": ['United States','Canada','Mexico'], "price": [23,32,21], "points": [3,4,4.5]})
df=df[['country','price','points']]
df2=df[['country']]
features=df2.columns
print(features)
target='points'

#------_-__-___---____________________
as_df=pd.concat([df[features],df[target]],axis=1)
#Now for Column Check
for column in as_df[features]:
    col=as_df[[column]]
    #Categorical Data Conversion
#This will split the countries into their own column with 1 being when it 
#is true and 0 being when it is false
    col.select_dtypes(include='object')
    dummies=pd.get_dummies(col)
    #ML Check:
    dumcols=dummies.drop(dummies.columns[1],axis=1)
    if dumcols.shape[1] > 1:
        print(column)
        as_df=as_df.drop(as_df[column])
    else:
        dummydf=col
as_df=pd.concat([as_df,dummydf],axis=1)
as_df.head()
Community
  • 1
  • 1
Tanner Clark
  • 631
  • 1
  • 8
  • 19

2 Answers2

1

I would comment instead of answering, but I do not have enough reputation to do so. (I need clarification to help you and Stack Exchange does not provide me with a way to do so "properly".)

I'm not entirely sure what your end-goal is. Could you clarify what your end result for as_df would look like? Including after the for loop ends, and after the entire code is finished running?

Will Lyles
  • 326
  • 2
  • 6
  • Welcome to SO. I am new too but it is a blessing sometimes. I found my mistake. Thanks for trying and I apologize for my ambiguity. – Tanner Clark Jan 21 '19 at 20:47
0

Found my mistake.

as_df=as_df.drop(as_df[column])

should be

as_df=as_df.drop(column,axis=1)
Tanner Clark
  • 631
  • 1
  • 8
  • 19