0

I would like to get the intersection of to lists of words using regex. It's C implementation making it runs faster is of huge importance in this particular case... Even though I have a code almost working, it would also match 'embeded-words', like "buyers" and "buy" for exemple.

Some code probably explains it better. This is what I have so far:

re.findall(r"(?=(" + '|'.join(['buy', 'sell', 'gilt']) + r"))", ' '.join(['aabuya', 'gilt', 'buyer']))
>> ['buy', 'gilt', 'buy']

While this is what I would like:

re.exactfindall(['buy', 'sell', 'gilt'], ['aabuya', 'gilt', 'buyer'])
>>['gilt']

Thanks.

ylnor
  • 4,531
  • 2
  • 22
  • 39
  • If I understand correctly, you're basically looking for intersection of two lists?(one is your list from sentences and another is a given list.) see answer here: http://stackoverflow.com/questions/3697432/how-to-find-list-intersection – xbb May 04 '17 at 19:25
  • I'm talking about regular-expression here actually. But thanks – ylnor May 04 '17 at 19:39

2 Answers2

1

To do this using regexps, the easiest way is probably to include word breaks (\b) in the matching expression, (outside the catch) giving you:

re.findall(r"\b(?=(" + '|'.join(['buy', 'sell', 'gilt']) + r")\b)",
    ' '.join(['aabuya', 'gilt', 'buyer']))

which outputs ['gilt'] as requested.

JohanL
  • 6,671
  • 1
  • 12
  • 26
0
listgiven=['aabuya', 'gilt', 'buyer']
listtomatch=['buy', 'sell', 'gilt']
exactmatch = [x for x in listgiven if x in listtomatch]
print(exactmatch)
user2510479
  • 1,528
  • 13
  • 17
  • Thanks, but since regex is implemented in C and runs faster, I'd rather try to find a solution using regex.findall if possible... – ylnor May 04 '17 at 20:05