First of all, this
True if True else False
is redundant. So in your first comp. you can just have: [x in a for x in b]
, similarly, [any(elb in ela for ela in a) for elb in b]
.
And I think this is as short, in terms of characters, that you are going to get it.
Efficiency wise, however, you could pre-generate all possible sub-strings from all the strings in a
, storing these in a set
.
This would mean that the complexity would be reduced from O(n*m*p)
, where n
is the length of b
, m
is the length of a
, and n
is the average sub-string length of a
, to simply O(n)
. This is because, once the sub-string lookup set has been created, checking a particular element in b
is an O(1)
operation since you are checking for inclusion in a set, rather than O(m*p)
where you have to check every sub-string of every element in a
.
To generate this sub-string lookup set you could use a set comprehension:
a_substrings = {s[i:j] for s in a for i in range(len(s)) for j in range(i+1, len(s)+1)}
then you can just check in
this:
[s in a_substrings for s in b]
which gives the expected [True, False]
for your inputs.
Is this really faster?
For small sized a
and b
lists, the overhead of creating the lookup set would outweigh the advantages of being able to check each element in b
. Furthermore, for an extortionately long a
list, containing long strings
, and even a moderately sized b
, it may again be slower to take the time going through all the sub-strings of a
and creating the lookup set, especially if the majority of elements in b
are going to match within the first few strings of a
.
However, in cases where both lists are long, most importantly when b
is long, your method would be continuously generating and checking the same elements of a
over and over again for each element of b
. Clearly this is slower than pre-calculating the sub-sets. I guess this is essentially a key optimisation of search engines - when someone presents a query, they don't start trawling website from a blank slate each time, instead they are continuously re-evaluating all known websites, of course in order of popularity, so that they are "ready to go" when a query comes in.