how to improve for loop in python

Question

I have this code:

    for row in range(len(df[col])):
        df[col][row] = int(df[col][row].replace(',','')) 
    df[col] = df[col].astype(int)
    df[col] = np.round(df[col]/500)*500  #rounds the numbers to the closest 500 multiple.
    df[col] = df[col].astype(int) #round returns a float, this turns it back to int after rounding

In the for loop the: df[col][row].replace(',','') basically removes commas from numbers that are stored as objects like 1,430 and then converts it to int like 1430

Then I'm having to add the df[col] = df[col].astype(int) because otherwise, the following np.round() throws the error: 'float' object has no attribute 'rint'

The thing is that after the np.round() I'm having to add again the .astype(int) because the round as I have it is returning a float, but I want ints.

I'm seeing that the execution of this is considerably long, even thought my dataframe is only 32 x 17

is there anyway I could improve it??

hi there welcome to SO, please see [ask] and [mcve] - also you don't need the loop do - `df[col].replace(',','').astype(int)` but unsure what you're trying to entirely — Umar.H, Aug 06 '20 at 15:10
Does this answer your question? [Convert Pandas Dataframe to Float with commas and negative numbers](https://stackoverflow.com/questions/42192323/convert-pandas-dataframe-to-float-with-commas-and-negative-numbers), and then just use `astype(int)` — MrNobody33, Aug 06 '20 at 15:20

score 0 · Answer 1 · answered Aug 06 '20 at 15:17

0

Would a more general replace using a lambda function df[col].apply(lambda x: x.str.replace(',','')) be more suitable and time efficient?

And would a one liner like this not yield what you are after?

df['col'] = (df['col'] / 500).astype(int) * 500

answered Aug 06 '20 at 15:17

Jonathan

748
3
20

1

it's not neccesary the usage of apply when specifying the col, you can just use `df[col].str.replace(',','')` ;) – MrNobody33 Aug 06 '20 at 15:22

score 0 · Answer 2 · answered Aug 06 '20 at 15:20

0

Don't do that for row in range(len(df[col])): do this: for row in df[col]

or instead of that for use this:

Use this for actually replacing string with another string: DataFrame.replace

or better use a lambda: DataFrame.apply (Example here)

answered Aug 06 '20 at 15:20

StefanMZ

453
1
4
11

how to improve for loop in python

2 Answers2