Try this:
# setup
df = pd.DataFrame({
"data": ['scarborough, scarborough, scarborough', 'london,london', 'north york, north york', 'test,test']
})
# logic
def custom_dedup(s):
return [*set([_.strip() for _ in s.split(',')])][0]
df['data'].apply(custom_dedup)
How it works
split()
: split the string on commas, which yields a list
strip()
: remove outer spaces from the each string in the list
set()
: get the unique elements from that list
...[0]
: we assume there's only one element per set, so take the first element
Input:
data
0 scarborough, scarborough, scarborough
1 london,london
2 north york, north york
3 test,test
Output:
0 scarborough
1 london
2 north york
3 test