Finding indices of items from a list in another list even if they repeat

Question

This answer works very well for finding indices of items from a list in another list, but the problem with it is, it only gives them once. However, I would like my list of indices to have the same length as the searched for list. Here is an example:

thelist = ['A','B','C','D','E'] # the list whose indices I want
Mylist = ['B','C','B','E'] # my list of values that I am searching in the other list
ilist = [i for i, x in enumerate(thelist) if any(thing in x for thing in Mylist)]

With this solution, ilist = [1,2,4] but what I want is ilist = [1,2,1,4] so that len(ilist) = len(Mylist). It leaves out the index that has already been found, but if my items repeat in the list, it will not give me the duplicates.

Are you looking for substrings like that question was, or just exact matches? — Ry-, Jun 30 '17 at 04:10
at the moment exact matches are fine, but a substring seems more robust — durbachit, Jun 30 '17 at 04:16
That sounds kind of suspect. What’s the actual purpose? (Lots of mistakes happen under the guise of robustness.) — Ry-, Jun 30 '17 at 04:39

user94559 · Accepted Answer · 2017-07-01T20:55:44.317

2

thelist = ['A','B','C','D','E']
Mylist = ['B','C','B','E']
ilist = [thelist.index(x) for x in Mylist]

print(ilist)  # [1, 2, 1, 4]

Basically, "for each element of Mylist, get its position in thelist."

This assumes that every element in Mylist exists in thelist. If the element occurs in thelist more than once, it takes the first location.

UPDATE

For substrings:

thelist = ['A','boB','C','D','E']
Mylist = ['B','C','B','E']
ilist = [next(i for i, y in enumerate(thelist) if x in y) for x in Mylist]

print(ilist)  # [1, 2, 1, 4]

UPDATE 2

Here's a version that does substrings in the other direction using the example in the comments below:

thelist = ['A','B','C','D','E']
Mylist = ['Boo','Cup','Bee','Eerr','Cool','Aah']

ilist = [next(i for i, y in enumerate(thelist) if y in x) for x in Mylist]

print(ilist)  # [1, 2, 1, 4, 2, 0]

edited Jul 01 '17 at 20:55

answered Jun 30 '17 at 04:13

user94559

59,196
6
103
103

Oh I see, got the original question wrong here with the substrings, I am looking for the opposite - suppose `thelist = ['A','B','C','D','E']` and `Mylist = ['Boo','Cup','Bee','Eerr','Cool','Aah']` and the desired output would be `[1,2,1,4,2,0]` – durbachit Jul 01 '17 at 02:57
Then just change `if x in y` to `if y in x`. – user94559 Jul 01 '17 at 20:53

score 1 · Answer 2 · answered Jun 30 '17 at 04:14

1

Below code would work

ilist = [ theList.index(i) for i in MyList ]

answered Jun 30 '17 at 04:14

Jay Parikh

2,419
17
13

score 1 · Answer 3 · answered Jun 30 '17 at 04:41

1

Make a reverse lookup from strings to indices:

string_indices = {c: i for i, c in enumerate(thelist)}
ilist = [string_indices[c] for c in Mylist]

This avoids the quadratic behaviour of repeated .index() lookups.

answered Jun 30 '17 at 04:41

Ry-

218,210
55
464
476

score 0 · Answer 4 · answered Jun 30 '17 at 04:37

If you data can be implicitly converted to ndarray, as your example implies, you could use numpy_indexed (disclaimer: I am its author), to perform this kind of operation in an efficient (fully vectorized and NlogN) manner.

import numpy_indexed as npi
ilist = npi.indices(thelist, Mylist)

npi.indices is essentially the array-generalization of list.index. Also, it has a kwarg to give you control over how to deal with missing values and such.

Finding indices of items from a list in another list even if they repeat

4 Answers4