How to detect string suffixes and remove these suffixed elements from list? - Python

Question

How to detect string suffixes and remove these suffixed elements from list? I understand that this looks like an NLP, stemming/lemmatization task but the task requires a simpler function.

Given, i need to remove elements that has s and es suffixes if the non-suffixed items exist in the list:

alist = ['bar','barbar','foo','foos','barbares','foofoos','bares']

I need to output:

alist = ['bar','barbar','foo','foofoos']

I've tried the following but it doesn't work because when i sort out the alist, it gets ['bar', 'barbar', 'barbares', 'bares', 'foo', 'foofoos', 'foos'] not ['bar', 'bares', 'barbar', 'barbares', 'foo', 'foos', 'foofoos']

alist = ['bar','barbar','foo','foos','barbares','foofoos','bares']

prev = ""
no_s_list = []
for i in sorted(alist):
  if i[-2:] == "es" and i[:-2] == prev:
    continue
  elif i[-1:] == "s" and i[:-1] == prev:
    contine
  else:
    prev = i
    no_s_list.append(i)

The above outputs:

>>> sorted(alist)
['bar', 'barbar', 'barbares', 'bares', 'foo', 'foofoos', 'foos']

http://stackoverflow.com/questions/771918/how-do-i-do-word-stemming-or-lemmatization — Hoopdady, Mar 05 '13 at 14:30

mgilson · Accepted Answer · 2013-03-05T14:38:29.600

8

def rm_suffix(s,suffixes):
    for suf in suffixes:
       if s.endswith(suf):
          return s[:-len(suf)]
    return s

alist = ['bar','barbar','foo','foos','barbares','foofoos','bares']
salist = set(alist)
suffixes = ('es','s')
blist = [x for x in alist 
         if (not x.endswith(suffixes)) or (rm_suffix(x,suffixes) not in salist)]
print blist  # ['bar', 'barbar', 'foo', 'foofoos']

edited Mar 05 '13 at 14:38

answered Mar 05 '13 at 14:33

mgilson

300,191
65
633
696

Thanks a million. should have thought of `str.endswith` =) – alvas Mar 05 '13 at 14:35
1

I was going to suggest to use a different sorting function, but this solution is definitely cleaner. – Gorbag Mar 05 '13 at 14:46

Ashwini Chaudhary · Answer 2 · 2013-03-05T14:57:19.337

You can also use regex here:

re.split() will return something like:

barbar --> ['barbar']

foos --> ['foo', 's', '']

barbares --> ['barbar', 'es', '']

foofoos --> ['foofoo', 's', '']

So, if the length of returned list is greater than 1 and first element in this returned list is found in alist then you can remove it.

code:

In [106]: alist = ['bar','barbar','foo','foos','barbares','foofoos','bares']

In [107]: s=set(alist)

In [108]: for x in s.copy():
     sol=re.split(r'(es|s)$',x)
     if len(sol)>1 and sol[0] in s:
         s.remove(x)
   .....:         

In [109]: s
Out[109]: set(['bar', 'foofoos', 'barbar', 'foo'])

How to detect string suffixes and remove these suffixed elements from list? - Python

2 Answers2