0

I'm trying to search word-tokenized abstracts for custom stem words using python. The following code is almost what I want. That is, do any of the values in stem_words appears once or more in word_tokenized_abstract?

if(any(word in stem_words for word in word_tokenized_abstract)):
    do stuff

where...

  • stem_words is a list of strings only
  • word_tokenized_abstract is a list of strings only

I based the above at one-liner to check if at least one item in list exists in another list?

My issue is that my stem_words are of different lengths. I've tried the following code (a modification of the above) which did not work for me. I've tried a few other modifications but they either don't work or cause a crash.

if(any(word in stem_words for word[0:len(word)] in word_tokenized_abstract)):
    do stuff

That is, do any of the values word_tokenized_abstract begin with any of the values in stem_words?

if it helps, my stem_words = ['pancrea', 'muscul', 'derma', 'ovar']

Thanks! I apologize if this question has been answered previously but I couldn't find it.

Community
  • 1
  • 1

1 Answers1

0

So you want to check if any string in a first list is contained in any of the strings of the second list.

I'd try this:

any(y.startswith(x) for y in word_tokenized_abstract for x in stem_words)

Explanation: for each stem x in stem_words check if any string in word_tokenized_abstract starts with x.

If you just want the stem to be a substring of the word then use:

any(x in y for y in word_tokenized_abstract for x in stem_words)
user2314737
  • 27,088
  • 20
  • 102
  • 114