I'm trying to use numpy.select
to replace string values within a column; if string contains a keyword, I need the whole string to be replaced with another keyword (there are +- 25 combinations).
df["new_col"] = np.select(
condlist=[
df["col"].str.contains("cat1", na=False, case=False),
df["col"].str.contains("cat2", na=False, case=False),
df["col"].str.contains("cat3", na=False, case=False),
df["col"].str.contains("cat4", na=False, case=False),
# ...
df["col"].str.contains("cat25", na=False, case=False),
],
choicelist=[
"NEW_cat1",
"NEW_cat2",
"NEW_cat3",
"NEW_cat4",
# ...
"NEW_cat25"
],
default="DEFAULT_cat",
)
Is there a more concise way, or should I just repeat str.contains(...)
within condlist
25 times?; is numpy.select
the proper way to do it, at all?
I assume dict
could be used here, but don't see how exactly.
df["col"].map(d)
where d
is a dict with old and new values like {"cat1":"NEW_cat1"}
wouldn't work (?) since I can't hardcode exact values that need to be replaced (and that's why I'm using str.contains
).