I have a very large dataframe, around 80GB. I want to change the type of some of its columns from object to category. Trying to do it this way:
df[col_name] = df[col_name].astype('category')
Takes around 1 minute per column, which is a lot. My first question would be why does it take that long? Just running:
df[col_name].astype('category')
takes just around 1 second. I tried something like:
temp = df[col_name].astype('category')
df.drop(columns=[col_name])
df[col_name] = temp
In this case it turns out that dropping the column is also very slow. Now, I also tried replacing drop by del, that is
temp = df[col_name].astype('category')
del df[col_name]
df[col_name] = temp
Surprisingly (for me) this was very fast. So My second question is why is del so much faster than drop in this case? What is the most "correct" way of doing this conversion, and what is the most efficient (hopefully they are the same)? Thanks