There are two lists, one consists of a character sequence of a sentence, and another consisting of words.
The objective is to match the items of a first list with the second for length of li_a (len(li_a))
times. The same matched words will be temporarily saved as candidates. After the final iteration process, the longest word will be chosen as our expected result, and appended into a new list.
Since there are 18 characters are in li_a
, let assume the literation time
is 18.
li_a = ['T','h','o','m','a','s','h','a','d','a','h','a','r','d','t','i','m','e']
li_words = ['a','The','Thomas','have','had','has','hard','hot','time','tea']
First, the first item in li_a
is matched against li_words
.
1. 'T' => li_words || li_a[0] => li_words
2. 'Th' => li_words || li_a[0]+li_a[1] => li_words
3. 'Tho' => li_words || li_a[0]+li_a[1]+li_a[2] => li_words
...
6. 'Thomas' => li_words || li_a[0]+..+li_a[5] => li_words (marks as candidate when the match is found)
...
18. 'Thomashadahardtime' => li_words || li_a[0]..li_a[17] => li_words
The above example shows how the first iterative process should be done. And it gives us with one candidate result which is Thomas
. But then, the items of li_a
from first 'T' to 's' (Thomas) will be deducted,
li_a = ['h','a','d','a','h','a','r','d','t','i','m','e']
and second iterative process like previous should be performed to retrieve next word.
Finally, our final result of a list should be like this:
final_li = ['Thomas','had','a','hard','time']
Attempt
Below attempt works for finding the longest match, but not for the iterative work and doesn't give the accurate result when there is a missing word in li_words
def matched_substring(li1, li2):
new_li = []
tmp = ''
for a in li1:
tmp += a
count = 0
for b in li2:
if tmp == b:
count += 1
if count == 0:
tmp1 = tmp.replace(a, '')
new_li.append(tmp1)
tmp = a
if li2.__contains__(tmp):
new_li.append(tmp)
return new_li
It returns as:['Thomas', 'h', 'a', 'd', 'a', 'h', 'a', 'r', 'd', 't', 'i', 'm']
THE CHARACTERS IN UNICODE
string_a = "['ဒီ|စစ်|ဆေး|မှု|ကို|သီး|ခြား|လွတ်|လပ်|တဲ့|ပု|ဂ္ဂို|လ်|တ|ဦး|က|ဦး|ဆောင်|ခိုင်း|တာ|ဟာ|လူ|ထု|အ|ကျိုး|အ|တွက်|ဖြစ်|တယ်|လို့|တ|ရား|ရေး|ဝန်|ကြီး|ဌာ|န|က|ထုတ်|ပြန်|တဲ့|ကြေ|ညာ|ချက်|ထဲ|မှာ|ဖေါ်|ပြ|ထား|ပါ|တယ်']"
To convert above String to List:
##Get rid of brackets & punctuation marks
strp_str = string_a.strip("[]")
strp_str = strp_str.strip("'")
##Now we achieve *li_a*
li_a = strp_str.split('|')
Link to clipboard for li_words
list: mm-words.txt
##Get all the words in List
read_words = open('mm-words.txt','r')
##Achieve them in List
li_words = read_words.read().split('\n')
##Now run into function
print analyze(li_a, li_words)