Efficiently find strings in list of lists of strings (Python)

Question

I'm looking for an efficient way to find different strings in a list of string lists and return their indices. Here is the code:

inp = [ 'ans1', 'ans2', 'ans3' ]
output = [ [ 'aaa', 'ans1', 'bbb', 'ccc', 'ans2', 'ddd' ],
           [ 'bbb', 'aaa', 'ans2', 'ddd', 'ans1', 'aaa' ],
           [ 'ddd', 'ccc', 'ans2', 'ans1', 'aaa', 'bbb' ] ]

# expected result
# result = [ [ 1, 4, 3 ], [ 4, 2, 2 ], [ -1, -1, -1 ] ]

Those reported in the result are the indices for the position in the output list of each string in the inp list. For example, ans2 is at index 4 in the first sublist, index 2 in the second sublist, and index 2 in the third sublist. Similarly for ans1. ans3, however, does not appear in any sublist and, therefore, the returned index is -1.

What I'm looking for is an efficient way to do this computation (possibly in parallel?) while avoiding the classic for loops that this can clearly be done with.

Some considerations:

output has shape equal to [ len( inp ), L ], where L is the size of the dictionary. In this case L = 5.

I'm sorry, I tried the usual nested for loops to do so but I was looking for performances and that's what I asked for, since I sincerely do not know where to start. — l4plac3, Jul 07 '21 at 15:46
https://stackoverflow.com/questions/9786102/how-do-i-parallelize-a-simple-python-loop — Pranav Hosangadi, Jul 07 '21 at 15:49

alexnik42 · Answer 1 · 2021-07-07T15:58:35.043

1

You can try list comprehension:

result = [[o.index(s) if s in o else -1 for o in output] for s in inp]
print(result) # [[1, 4, 3], [4, 2, 2], [-1, -1, -1]]

Update:

Also it's probably not the best idea to store -1 as an index for strings, which are not presented in the output list. -1 is a valid index in Python, which may potentially lead to errors in the future if you plan to do something with indexes, stored in the result.

edited Jul 07 '21 at 15:58

answered Jul 07 '21 at 15:35

alexnik42

188
10

@not_speshal agreed, using variable "input" will lead to an error, fixed it in the post – alexnik42 Jul 07 '21 at 15:37
It will actually NOT lead to an error. It's just bad practice. – not_speshal Jul 07 '21 at 15:37
Minor correction: The `//` should be `#`. – Book Of Flames Jul 07 '21 at 15:43
Is there any chance to improve the performances and possibly do the calculation in parallel for each different string in `inp`? – l4plac3 Jul 07 '21 at 15:44
A list comprehension is usually no more or less efficient than a regular loop for simple things like this. In this case, it's _less_ efficient because `if s in o` already traverses the list once, and `o.index(s)` does it again. – Pranav Hosangadi Jul 07 '21 at 15:45

score 0 · Answer 2 · answered Jul 07 '21 at 15:51

You can create dictionary index first to speed-up the search:

inp = ["ans1", "ans2", "ans3"]
output = [
    ["aaa", "ans1", "bbb", "ccc", "ans2", "ddd"],
    ["bbb", "aaa", "ans2", "ddd", "ans1", "aaa"],
    ["ddd", "ccc", "ans2", "ans1", "aaa", "bbb"],
]

tmp = [{v: i for i, v in enumerate(subl)} for subl in output]

result = [[d.get(i, -1) for d in tmp] for i in inp]
print(result)

Prints:

[[1, 4, 3], [4, 2, 2], [-1, -1, -1]]

Efficiently find strings in list of lists of strings (Python)

2 Answers2