-1

I need to find the starting index of the specific sequences (sequence of strings) in the list in python.

For ex.

list = ['In', 'a', 'gesture', 'sure', 'to', 'rattle', 'the', 'Chinese', 'Government', ',', 'Steven', 'Spielberg', 'pulled', 'out', 'of', 'the', 'Beijing', 'Olympics', 'to', 'protest', 'against', 'China', '_s', 'backing', 'for', 'Sudan', '_s', 'policy', 'in', 'Darfur', '.']

ex.

seq0 = "Steven Spielberg"
seq1 = "the Chinese Government"
seq2 = "the Beijing Olympics"

The output should be like :

10
6
15
Filip Młynarski
  • 3,534
  • 1
  • 10
  • 22
blueWings
  • 71
  • 1
  • 9

2 Answers2

2

You could simply iterate over list of your words and check at every index if following words match any of your sequences.

words = ['In', 'a', 'gesture', 'sure', 'to', 'rattle', 'the', 'Chinese', 'Government', ',', 'Steven', 'Spielberg', 'pulled', 'out', 'of', 'the', 'Beijing', 'Olympics', 'to', 'protest', 'against', 'China', '_s', 'backing', 'for', 'Sudan', '_s', 'policy', 'in', 'Darfur', '.']\

seq0 = "Steven Spielberg"
seq1 = "the Chinese Government"
seq2 = "the Beijing Olympics"

sequences = {'seq{}'.format(idx): i.split() for idx, i in enumerate([seq0, seq1, seq2])}

for idx in range(len(words)):
    for k, v in sequences.items():
        if idx + len(v) < len(words) and words[idx: idx+len(v)] == v:
            print(k, idx)

Output:

seq1 6
seq0 10
seq2 15
Filip Młynarski
  • 3,534
  • 1
  • 10
  • 22
0

You can do something like:

def find_sequence(seq, _list):
    seq_list = seq.split()
    all_occurrence = [idx for idx in [i for i, x in enumerate(_list) if x == seq_list[0]] if seq_list == list_[idx:idx+len(seq_list)]]
    return -1 if not all_occurrence else all_occurrence[0]

Output:

for seq in [seq0, seq1, seq2]:
    print(find_sequence(seq, list_))

10

6

15

Note, if the sequence is not found you will get -1.

Rajan
  • 1,463
  • 12
  • 26