I have to cleanup the dataframe column "id" to ensure that each value has a length of five and any value that is less than five needs to be prepended with zeros.
The following code works great on a small dataframe however, when I run for loop against my larger datagrame of ~500k rows, it still wasn't finished after 30 min.
#sample dataframe
df1 = pd.DataFrame({'id': ['676', '1931'],
'fu': ['bar', 'baz']})
# for loop used to update id
for id in df1['id']:
if len(id) < 5:
delta = (5 - len(id))
new_id = ("0" * delta) + id
df1.loc[df1['id'] == id, 'id'] = new_id
Can I speed this up? Is there an alternative to .loc that I could use?