0

I have two lists, and I want to find the items with the same/partial characters and put the results in a dictionary:

list_a= ['helloyou', 'waithere', 'byenow']

list_b =[ 'wait', 'hello', 'bye']

Result wanted:

dict_c= {'helloyou:hello', 'waithere:wait', 'byenow:bye'}

I've tried this but doesn't seem to work:

dict_c= {j:i for i,j in zip(list_a,list_b) if re.match(j,i)} 

EDIT:

I may have some items that don't start the same, for example:

list_a= ['helloyou', 'waithere', 'byenow']

list_b =[ 'yeswait', 'plushello', 'nobye']

Result wanted:

dict_c= {'helloyou:plushello', 'waithere:yeswait', 'byenow:nobye'}

EDIT

what if I could instead have a situation like this where I could use a separator to split the items and use the start with

list_a = ['hid/jui/helloyou', 'hhh/hdhdh/waithere', 'jcdjcjd/bdcdbc/byenow']

list_b = ['abc/efg/waitai_lp', 'hil/mno/helloai_lj', 'pqr/byeai_ki']

Result wanted

dict_c = {'hid/jui/helloyou:hil/mno/helloai_lj','hhh/hdhdh/waithere:abc/efg/waitai_lp', 'jcdjcjd/bdcdbc/byenow:pqr/byeai_ki'}

Eys
  • 13
  • 4
  • Will the longer string always be in `list_a`, or could it be in either? – Reti43 Feb 13 '21 at 22:35
  • could be either! – Eys Feb 13 '21 at 22:40
  • The latest updated question doesnt make sense. How would you know when the overlap between the 2 strings is sufficient to match? `'helloyou:plushello'` match because `hello` is common but so do `abcd:blkm` because they have `b` in common. – Akshay Sehgal Feb 13 '21 at 22:43
  • This completely changes your primary question. and makes the problem a completely different problem. It was a good question until that edit completely changed the whole problem :) – Akshay Sehgal Feb 13 '21 at 22:44
  • @AkshaySehgal thanks for your reply, yes that's why I'm struggling to find a solution. what if I could instead have a situation like this where I could use a separator to split the items and use the start with list_a = ['hid/jui/helloyou', 'hhh/hdhdh/waithere', 'jcdjcjd/bdcdbc/byenow'] list_b = ['abc/efg/waitai_lp', 'hil/mno/helloai_lj', 'pqr/byeai_ki'] Result wanted dict_c = {'hid/jui/helloyou:hil/mno/helloai_lj', 'hhh/hdhdh/waithere:abc/efg/waitai_lp', 'jcdjcjd/bdcdbc/byenow:pqr/byeai_ki'} – Eys Feb 13 '21 at 23:20
  • How would you even come up with a separator like that? Lets take an example - `"hidesign"` , is it `"hide/sign"` or `"hi/design"` according to you? Does this example show you that what you are trying to solve is essentially unsolvable? – Akshay Sehgal Feb 13 '21 at 23:34
  • Let me add, you want these to be paired together right? `'hid/jui/helloyou:hil/mno/helloai_lj'` .. How do you know that `helloyou` is comprised of `hello` & `you` and not `hell` & `oyou`? You would need a separator for THAT to say `"hello/you"` – Akshay Sehgal Feb 13 '21 at 23:55
  • For any of the future changes, you may want to consider to extend the "if" condition with a more flexible and enriched approach, like a function to determine whether your pair you are considering at the moment satisfy any of your expectation. I would suggest you to open a new question for future changes, keep editing the same question is unpolite and may become unfollowed – Marco Massetti Feb 15 '21 at 22:59

3 Answers3

1

Try this -

The issue is that you are zipping the corresponding items in the 2 lists instead of taking a cross-product between them. So, in the zipped version, only (bye,byenow) would return something from re.match.

from itertools import product

{j:i for i,j in product(list_a, list_b) if re.match(j,i)} 
{'hello': 'helloyou', 'wait': 'waithere', 'bye': 'byenow'}
Akshay Sehgal
  • 18,741
  • 3
  • 21
  • 51
1
list_a= ['helloyou', 'waithere', 'byenow']

list_b =[ 'wait', 'hello', 'bye']

dict_c = {a:b for a in list_a for b in list_b if a.startswith(b)}

Output dict_c: {'helloyou': 'hello', 'waithere': 'wait', 'byenow': 'bye'}

RJ Adriaansen
  • 9,131
  • 2
  • 12
  • 26
0

You can consider using the method explained here, joined with the answers of @Akshay

from collections import Counter
from itertools import product

def shared_chars(s1, s2):
    return sum((Counter(s1) & Counter(s2)).values())

{j:i for i,j in product(list_a, list_b) if shared_chars(j,i) > 3}

The hard part would be to set the constant value "3" in a dynamic way based on some parameters like the length of the strings under examination. For now, I considered the shortest word "bye" as the minimum

Marco Massetti
  • 539
  • 4
  • 12