0

Given adf with multiple (>500) columns with 'O' dtype that can be converted to int (e.g ['5','3','2.3']) we wish to convert them to ints/floats.

Using the to_numeric is our current solution, but it takes hours to convert all the columns.

The current code:

    for col in tqdm(df.select_dtypes(include="object").columns, desc="Objects to Numeric"):
        df[col] = pd.to_numeric(df[col], errors="ignore")

We also tried to manuly parallelize the process, but with no efficency gain:

    def to_numeric(df,col):
        df[col] = pd.to_numeric(df[col], errors="ignore")
    from joblib import Parallel, delayed
    Parallel(n_jobs=20, verbose=100)(
        delayed(to_numeric)(df, col) for col in df.select_dtypes(include="object").columns
    )

Ideas?

InsDSt
  • 37
  • 1
  • 4

0 Answers0