2

Given the following data:

a = ["onee", "two", "three"]
b = ["one", "four"]

I would like for some test such as the following:

[True if x in a else False for x in b]

To return

[True, False]

Instead of

[False, False]

So for each element in list b, I want to see whether it's a substring of any of the elements in list a.

One way this can be done is as following:

test = []
for elb in b:
    included = False
    for ela in a:
        if elb in ela:
            included = True
        break
    test.append(included)

I don't feel that this is a very nice approach though, perhaps there's a comprehension that can improve on it?

The following also works:

[True if any(elb in ela for ela in a) else False for elb in b]

I'm just thinking that there's likely to be a nicer approach.

Joe Iddon
  • 20,101
  • 7
  • 33
  • 54
baxx
  • 3,956
  • 6
  • 37
  • 75

4 Answers4

3

First of all, this

True if True else False

is redundant. So in your first comp. you can just have: [x in a for x in b], similarly, [any(elb in ela for ela in a) for elb in b].

And I think this is as short, in terms of characters, that you are going to get it.

Efficiency wise, however, you could pre-generate all possible sub-strings from all the strings in a, storing these in a set.

This would mean that the complexity would be reduced from O(n*m*p), where n is the length of b, m is the length of a, and n is the average sub-string length of a, to simply O(n). This is because, once the sub-string lookup set has been created, checking a particular element in b is an O(1) operation since you are checking for inclusion in a set, rather than O(m*p) where you have to check every sub-string of every element in a.

To generate this sub-string lookup set you could use a set comprehension:

a_substrings = {s[i:j] for s in a for i in range(len(s)) for j in range(i+1, len(s)+1)}

then you can just check in this:

[s in a_substrings for s in b]

which gives the expected [True, False] for your inputs.


Is this really faster?

For small sized a and b lists, the overhead of creating the lookup set would outweigh the advantages of being able to check each element in b. Furthermore, for an extortionately long a list, containing long strings, and even a moderately sized b, it may again be slower to take the time going through all the sub-strings of a and creating the lookup set, especially if the majority of elements in b are going to match within the first few strings of a.

However, in cases where both lists are long, most importantly when b is long, your method would be continuously generating and checking the same elements of a over and over again for each element of b. Clearly this is slower than pre-calculating the sub-sets. I guess this is essentially a key optimisation of search engines - when someone presents a query, they don't start trawling website from a blank slate each time, instead they are continuously re-evaluating all known websites, of course in order of popularity, so that they are "ready to go" when a query comes in.

Joe Iddon
  • 20,101
  • 7
  • 33
  • 54
2

This is another approach I came over with:

[x in "-".join(y for y in a) for x in b]

Joining all the strings of a in one string and testing if the element is in there.

Outputs:

[True, False]

Disclaimer: Not sure if this is precisely "nicer" but well, again, just another approach.

revliscano
  • 2,227
  • 2
  • 12
  • 21
0

This is enough:

[ any(elb in ela for ela in a) for elb in b ]
Błotosmętek
  • 12,717
  • 19
  • 29
0

You can do:

>>> [ y.startswith(x) for x, y in zip(b,a)]
[True, False]
>>> 
Miro
  • 112
  • 1
  • 3
  • What if the the string in `b` is not in the beginning of the string in `a`? For instance: `a = ["konee", "two", "three"]` – revliscano May 03 '20 at 19:48
  • Also, when you do `zip(a, b)` you're reducing the the lookup to the first two elements of `a`, because `len(b) < len(a)` – revliscano May 03 '20 at 19:51
  • Then if we need to report that one as True we need to use y in x. I am just following the question and case 'one' and 'onee' – Miro May 03 '20 at 19:52
  • Yes, this definitely only works for the strictly same case as the question. – revliscano May 03 '20 at 19:53
  • correct we are reducing and optimizing the things, we need to loop over the short list. There no point to loop over the longer one and to check with shorter one – Miro May 03 '20 at 19:53
  • Yes, but again, then it's only right for this specific case, because if you had `a = ["two", "three", "onee"]`, then again, doing `zip(a, b)` would fail. – revliscano May 03 '20 at 19:58
  • Answer depends on question. I am providing answer exactly for this question with one for and built-ins. If need something more specific I could provide different solution. – Miro May 03 '20 at 20:03
  • Alright, buddy ;-) I just thought that the clarification was necessary. – revliscano May 03 '20 at 20:08