0

I have a forloop that is changing the address format of over 500,000 rows, it works but it's taking a long time to run. Is there a way to make it run more efficiently?

for lab, row in df.iterrows():
    df.loc[lab,"Address"] = (row["Address"].title())   
Barmar
  • 741,623
  • 53
  • 500
  • 612
Agatha
  • 21
  • 1

1 Answers1

0

You never want to use iterrows(). Indeed, you will want to stay away from any kind of custom row-wise iterations. Try this instead for your specific purpose.

df.assign(Address=lambda d: d["Address"].str.title())

It will return the dataframe with the updated column.

Here is the speed test as requested. It is just about 266x faster.

%timeit df.assign(Address=lambda d: d["Address"].str.title())
# 100 ms ± 2.62 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit for lab, row in df.iterrows(): df.loc[lab,"Address"] = (row["Address"].title()) 
# 26.6 s ± 2.63 s per loop (mean ± std. dev. of 7 runs, 1 loop each)