0

I have the following function (one-hot encoding function that takes a column as an input). I basically want to apply it to a column in my dataframe, but can't seem to understand what's going wrong.

def dummies(dataframe, col):
    dataframe[col] = pd.Categorical(dataframe[col])
    pd.concat([dataframe,pd.get_dummies(dataframe[col],prefix = 'c')],axis=1)

df1 = df['X'].apply(dummies)

Guessing something is wrong with how I'm calling it?

user10939484
  • 167
  • 2
  • 13

2 Answers2

2

you need to make sure you're returning a value from the function, currently you are not..also when you apply a function to a column you are basically passing the value of each row in the column into the function, so your function is set up wrong..typically you'd do it like this:

def function1(value):
    new_value = value*2 #some operation
    return new_value

then:

df['X'].apply(function1)

currently your function is set up to take entire df, and the name of a column, so likely your function might work if you call it like this:

df1 = dummies(df, 'X')

but you still need to add a return statement

Derek Eden
  • 4,403
  • 3
  • 18
  • 31
  • Thanks Derek, can't believe I missed the return statement. Side question.. is it possible to apply this function to multiple columns in one line? – user10939484 Nov 28 '19 at 00:20
  • like apply the same function to multiple columns individually? or apply a function that uses multiple columns as input? – Derek Eden Nov 28 '19 at 00:25
  • in either case, check these questions, they cover both: https://stackoverflow.com/questions/50519983/how-to-apply-a-function-to-multiple-columns-in-pandas and https://stackoverflow.com/questions/13331698/how-to-apply-a-function-to-two-columns-of-pandas-dataframe – Derek Eden Nov 28 '19 at 00:27
-1

If you want to apply it to that one column you don't need to make a new dataframe. This is the correct syntax. Please read the docs.

df['X'] = df['X'].apply(lambda x : dummies(x))
Joseph Rajchwald
  • 487
  • 5
  • 13