0

I need to optimize this code part which takes dozens of seconds with a big dataset .

        if ((isnan(data["x"][i]))==False):
            data["Visibility"][i]=int(data["Visibility"][i][0:2]) # Extract the first two numbers 
        else:
            data["x"][i]=1000 # Replace null values with 1000 

Editing: For my dataset, I have string column values and I want to replace them with subsets of same values

Here is an example :

"01 : visibilité horizontale 0.1km" --> 01

"02 : visibilité horizontale 0.2km" --> 02

"03 : visibilité horizontale 0.3km" -- > 03

...

Amz Kab
  • 23
  • 5
  • 1
    Can you show a bit more ? Also, it looks like you're iterating over a dataframe. That's extremely slow and very rarely (if ever) the solution. – ApplePie May 19 '21 at 11:06
  • If all you want to do is replace the NaNs or missing values in a dataframe, look at the [fillna function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html) which should do it much more efficiently than iterating over the dataframe – pu239 May 19 '21 at 11:08
  • 1
    Please include a _small_ subset of your data as a __copyable__ piece of code that can be used for testing as well as your expected output for the __provided__ data. See [MRE - Minimal, Reproducible, Example](https://stackoverflow.com/help/minimal-reproducible-example), and [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/15497888). – Henry Ecker May 19 '21 at 11:16
  • And just a side note - no need to ever compare to `True` or `False` in `if` statements. you can (and should) do something like `if not isnan(data["x"][i])):` – Gulzar May 19 '21 at 11:49
  • I added some clarification to my question .. – Amz Kab May 19 '21 at 11:59

1 Answers1

0

What's slowing you down is for-looping over the data frame, instead of using built in functions.

WITHOUT A FOR LOOP:

data.loc[~isnan(data["x"]), "Visibility"] = data.loc[~isnan(data["x"]), "Visibility"].str[:2]
data.loc[isnan(data["x"]), "Visibility"] = 1000

reference


above code is untested, as you provided no reproducible example.

Gulzar
  • 23,452
  • 27
  • 113
  • 201