Given adf
with multiple (>500) columns with 'O'
dtype that can be converted to int (e.g ['5','3','2.3']
) we wish to convert them to ints/floats.
Using the to_numeric is our current solution, but it takes hours to convert all the columns.
The current code:
for col in tqdm(df.select_dtypes(include="object").columns, desc="Objects to Numeric"):
df[col] = pd.to_numeric(df[col], errors="ignore")
We also tried to manuly parallelize the process, but with no efficency gain:
def to_numeric(df,col):
df[col] = pd.to_numeric(df[col], errors="ignore")
from joblib import Parallel, delayed
Parallel(n_jobs=20, verbose=100)(
delayed(to_numeric)(df, col) for col in df.select_dtypes(include="object").columns
)
Ideas?