Pretty self explanatory, one line, looks pythonic
df['city'] + (df['city'] == 'paris')*('_' + df['arr'].astype(str))
s = """city,arr,final_target
paris,11,paris_11
paris,12,paris_12
dallas,22,dallas
miami,15,miami
paris,16,paris_16"""
import pandas as pd
import io
df = pd.read_csv(io.StringIO(s)).sample(1000000, replace=True)
df
Speeds
%%timeit
df['city'] + (df['city'] == 'paris')*('_' + df['arr'].astype(str))
# 877 ms ± 19.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
df['final_target'] = np.where(df['city'].eq('paris'),
df['city'] + '_' + df['arr'].astype(str),
df['city'])
# 874 ms ± 19.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
I am not sure why this example fails(update: failing due to sampling) but memory error is still a mystery Why memory error when using .loc in pandas with sampling instead of direct computing
%%timeit
df['final_target'] = df['city']
df.loc[df['city'] == 'paris', 'final_target'] += '_' + df['arr'].astype(str)
MemoryError: Unable to allocate 892. GiB for an array with shape (119671145392,) and data type int64