1

I have a list of keywords, for example:

keywords = ['airbnb.com', 'booking', 'deliveroo.uk - UK', ...]

My goal is to define the parameter token_pattern of CountVectorizer by concatenating all keywords.

The idea is this:

token_pattern = '|'.join([pattern_keyword_1, pattern_keyword_2, ...])

What interests me is that it matches the exact occurrences in the text and not the substrings.

For example, if I have 'def.com' in the keywords I DON'T want it to match 'abcdef.com'.

Is it possible to do it?

Thanks in advance.

LJG
  • 601
  • 1
  • 7
  • 15

0 Answers0