Saw a quite big variety of answers, so decided to compare them in terms of speed:
# Create big size test dataframe
df = pd.DataFrame({'bloomberg_ticker_y' : ['AIM9', 'DJEM9', 'FAM9', 'IXPM9']})
df = pd.concat([df]*100000)
df.shape
#Out
(400000, 1)
CS95 #1 np.where
%%timeit
np.where(df['bloomberg_ticker_y'].str.len() > 4,
df['bloomberg_ticker_y'].str[3:],
df['bloomberg_ticker_y'])
Result:
163 ms ± 12.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
CS95 #2 vectorized map
based solution
%%timeit
df['bloomberg_ticker_y'].map(lambda x: x[3:] if len(x) > 4 else x)
Result:
86 ms ± 7.31 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Yatu DataFrame.mask
%%timeit
df.bloomberg_ticker_y.mask(df.bloomberg_ticker_y.str.len().gt(4),
other=df.bloomberg_ticker_y.str[-2:])
Result:
187 ms ± 18.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Vlemaistre list comprehension
%%timeit
[x[-2:] if len(x)>4 else x for x in df['bloomberg_ticker_y']]
Result:
84.8 ms ± 4.85 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
pault str.replace
with regex
%%timeit
df["bloomberg_ticker_y"].str.replace(r".{3,}(?=.{2}$)", "")
Result:
324 ms ± 17.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Cobra DataFrame.apply
%%timeit
df.apply(lambda x: (x['bloomberg_ticker_y'][3:] if len(x['bloomberg_ticker_y']) > 4 else x['bloomberg_ticker_y']) , axis=1)
Result:
6.83 s ± 387 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Conclusion