Is it possible to replace strings from one column with corresponding strings from another columns in a pandas dataframe using only the pandas.Series.str
methods? "No" is an acceptable answer so long as it's accompanied with the pandas version and relevant part of the docs.
Here's an example:
import pandas as pd
# version >= 0.19.2
df = pd.DataFrame(
{
'names': ['alice', 'bob', 'catherine', 'slagathor'],
'hobbies': [
'alice likes to knit',
'bob likes to bowl',
'plays with her cats',
'slagathor burniates peasants for fun'
]
}
)
def clean(df: pd.DataFrame) -> pd.Dataframe: ... # do the substitutions
assert all(
clean(df).hobbies == pd.Series([
'likes to knit',
'likes to bowl',
'plays with her cats',
'burniates peasants for fun'
])
)
In this case, I'd like to omit the strings from the name
column from the hobbies
column, using something like
df.hobbies.str.replace('(' + df.names + r'\s*)?', '') # doesn't work
So far, I've had to
import re
df['replaced'] = pd.Series(
re.sub(f'^{df.names[i]} ?', '', df.hobbies[i]) for i in df.index
)
as in the answer to Replace values from one column with another column Pandas DataFrame