How to detect string suffixes and remove these suffixed elements from list? I understand that this looks like an NLP, stemming/lemmatization task but the task requires a simpler function.
Given, i need to remove elements that has s
and es
suffixes if the non-suffixed items exist in the list:
alist = ['bar','barbar','foo','foos','barbares','foofoos','bares']
I need to output:
alist = ['bar','barbar','foo','foofoos']
I've tried the following but it doesn't work because when i sort out the alist, it gets ['bar', 'barbar', 'barbares', 'bares', 'foo', 'foofoos', 'foos']
not ['bar', 'bares', 'barbar', 'barbares', 'foo', 'foos', 'foofoos']
alist = ['bar','barbar','foo','foos','barbares','foofoos','bares']
prev = ""
no_s_list = []
for i in sorted(alist):
if i[-2:] == "es" and i[:-2] == prev:
continue
elif i[-1:] == "s" and i[:-1] == prev:
contine
else:
prev = i
no_s_list.append(i)
The above outputs:
>>> sorted(alist)
['bar', 'barbar', 'barbares', 'bares', 'foo', 'foofoos', 'foos']