0

I am trying to One Hot Encode categorical columns in my dataset. I am using the following function:

def create_ohe(df, col):
    le = LabelEncoder()
    a = le.fit_transform(df_new[col]).reshape(-1,1)
    ohe = OneHotEncoder(sparse=False)
    column_names = [col + "_" + str(i) for i in le.classes_]
    return (pd.DataFrame(ohe.fit_transform(a), columns=column_names))

I am getting MemoryError when I call the function in this loop:

for column in categorical_columns:
    temp_df = create_ohe(df_new, column)
    temp = pd.concat([temp, temp_df], axis=1)

Error Traceback:

MemoryError                               Traceback (most recent call last)
<ipython-input-40-9b241e8bf9e6> in <module>
      1 for column in categorical_columns:
----> 2     temp_df = create_ohe(df_new, column)
      3     temp = pd.concat([temp, temp_df], axis=1)
      4 print("\nShape of final df after one hot encoding: ", temp.shape)

<ipython-input-34-1530423fdf06> in create_ohe(df, col)
      8     ohe = OneHotEncoder(sparse=False)
      9     column_names = [col + "_" + str(i) for i in le.classes_]
---> 10     return (pd.DataFrame(ohe.fit_transform(a), columns=column_names))

MemoryError: 

1 Answers1

0

Ah memory error means that either your computer is at the maximum use of your memory (RAM) or that python is at the maximum: Memory errors and list limits?

you could try to split the a = le.fit_transform(df_new[col]).reshape(-1,1) method. Try to run b= le.fit(df_new[col]) so that you are fitting your label encoder with the full dataset, and then you could split it that you do not transform it for every row at the same time, maybe this helps. If b= le.fit(df_new[col])is also not working, you have a memory problem, the colyou have the replace with your column names.

fit_transformis a combination of fitand transform.

PV8
  • 5,799
  • 7
  • 43
  • 87