0

I have a list with values say list =[Copper, Wood, Glass, Metal] string = 'Wood table with small glass center,little bit of metal'

I need to search if specific values are available in my string but should ignore the least prominent values like glass and metal using nearby words. I tried re.findall and I am getting output as Wood, Glass, Metal. How to ignore 'Glass' and 'Metal' in this case by using nearby keywords such as 'small' and 'little'.

Expected Output = [Wood]

Priya
  • 47
  • 4

1 Answers1

0

My understanding: What I understand from your question is that you are trying to remove values from the list that follow words such as 'small' and 'little'.

Code:

lst = ['Copper', 'Wood', 'Glass', 'Metal']
string = 'Wood table with small glass center,little bit of metal'
keywords = ['small','little']

punc = '''!()-[]{};:'"\, <>./?@#$%^&*_~'''
for ele in string:
    if ele in punc:
        string = string.replace(ele, " ")

lst = [stringg.lower() for stringg in lst]
string = string.lower()
lst = [word for word in lst if word.lower() in string.lower()]

words_lst = string.split(' ')

final = []
count = 0
for elem in lst:
    count = 0
    index = words_lst.index(elem)
    slice_index = index - 4 if index - 4 >= 0 else 0
    range_lst = words_lst[slice_index:index + 1]

    for keyword in keywords:
        if keyword not in range_lst and elem not in final:
            count += 1
    if count == len(keywords):
        final.append(elem)

Output:

>>> final
['wood']
Sushil
  • 5,440
  • 1
  • 8
  • 26
  • The list would remain standard and the values are not supposed to be removed. Suppose I have another row in my dataframe column with string value such as 'metal framework covered with glass', from the given standard list it should yield output as ['metal','glass']. Only in the example given above when its preceded by 'small' or 'little' keywords it should be ignored – Priya Oct 28 '20 at 16:17