1

I have thousands of items within check_word_list.

I want to replace the items in check_word_list with items in correct_product_name_list in the order of check_word_list, so that the result will be exactly like the result_list.

The word of each corresponding item will have similarities (for e.g. correct_product_name_list[4] is similar to check_word_list[2]), they are both the same item, but has different naming, so I want to name it according to correct_product_name_list, so that the result will be like the result_list. How do I do that?

correct_product_name_list = [
'DANCOW FORTIGRO INSTANT COKLAT SUSU BUBUK 40GR',
'DANCOW FORTIGRO SACHET INSTAN SUSU BUBUK (10 X 27 GR)',
'HAVERJOY ROLLED OATS 1 KG - DUS',    
'KONDOM SUTRA CLASSIC ISI 24 PCS', 
'KONDOM SUTRA OK ISI 12 PCS', 
'TISSUE BASAH DETOL WET WIPES ISI 50 LEMBAR' 
'VASELINE HYPOALLERGENIC REPAIRING JELLY BABY 50ML', 
]

check_word_list = [
'DANCOW FORTIGRO SACHET INSTAN SUSU BUBUK (10 X 27 GR) - Coklat, 5 pcs',
'KONDOM SUTRA CLASSIC ISI 1 3 12 24 SUTERA MERAH not durex / fiesta - 3pcs', 
'HAVERJOY HAVERMOUT ROLLED OATS 1 KG - KUNING',  
'DANCOW FORTIGRO SACHET INSTAN SUSU BUBUK (10 X 27 GR) - Vanila, 5 pcs', 
'TISU BASAH / TISSUE BASAH DETOL WET WIPES ISI 50 LEMBAR',  
'VASELINE HYPOALLERGENIC REPAIRING JELLY BABY 50 ML', 
'Kondom Sutra Ok isi 1 / 3 / 12 / 24 bukan durex / fiesta - 12pcs']

result_list = [
'DANCOW FORTIGRO INSTANT COKLAT SUSU BUBUK 40GR',  
'KONDOM SUTRA CLASSIC ISI 24 PCS', 
'HAVERJOY ROLLED OATS 1 KG - DUS', 
'DANCOW FORTIGRO SACHET INSTAN SUSU BUBUK (10 X 27 GR)', 
'TISSUE BASAH DETOL WET WIPES ISI 50 LEMBAR',
'VASELINE HYPOALLERGENIC REPAIRING JELLY BABY 50ML', 
'KONDOM SUTRA OK ISI 12 PCS']
David Buck
  • 3,752
  • 35
  • 31
  • 35
Stephen
  • 9
  • 3
  • 1
    What have you tried so far and where are you stuck? – David Buck Sep 30 '21 at 21:08
  • I have no idea what to do to find similarities between the lists and make a new one. I read this article and it is likely something that I would like to achieve, but it is so hard to understand: https://towardsdatascience.com/calculating-string-similarity-in-python-276e18a7d33a. Is it possible to solve this without using keyword? Maybe, if both items in different list have the max matching word, then the item on the correct_product_name_list will be copied and pasted onto the result list. I really have no idea if it is possible to do that. – Stephen Sep 30 '21 at 21:21

1 Answers1

0

Maybe you could try what is described in this answer, basically:

  1. looping on check_word_list
  2. looping on correct_word_list
  3. compare the similarity off both words with difflib

Pseudocode:

import difflib
result_list = []

# edit this to match your requirements
match_thresold = 0.5

for check_word in check_word_list:
    for correct_word in correct_word_list:
        if difflib.SequenceMatcher(None, check_word, correct_mord).ratio() >= match_threshold:
            result_list.append(correct_word)

Note: there are other libraries suggested in the linked answer if this one works for you!

Agate
  • 3,152
  • 1
  • 19
  • 30
  • If you know the order, and it's just a fill down exercise, you can use Excel for this. If you want to find similarities between strings, you can use fuzzy matching. Check out the link below and see if it helps your cause. https://towardsdatascience.com/fuzzy-string-matching-in-python-68f240d910fe – ASH Oct 01 '21 at 01:08
  • I have tried the difflib.SquenceMatcher, but it can't deliver precise result. But thank you, I will try other methods you suggest – Stephen Oct 01 '21 at 08:36