Find word infront and behind of a Python list

Question

This is related to following question - Searching for Unicode characters in Python

I have string like this -

sentence = 'AASFG BBBSDC FEKGG SDFGF'

I split it and get list of words like below -

sentence = ['AASFG', 'BBBSDC', 'FEKGG', 'SDFGF']

I search of part of a word using following code and get whole word -

[word for word in sentence.split() if word.endswith("GG")]

It returns ['FEKGG']

Now i need to find out what is infront and behind of that word.

For example when i search for "GG" it returns ['FEKGG']. Also it should able to get

behind = 'BBBSDC'
infront = 'SDFGF'

Can you please select a valid answer if one of us gave what you needed? — DevLounge, Aug 12 '13 at 11:08

score 3 · Accepted Answer · edited May 23 '17 at 10:31

Using this generator:

If you have the following string (edited from original):

sentence = 'AASFG BBBSDC FEKGG SDFGF KETGG'

def neighborhood(iterable):
    iterator = iter(iterable)
    prev = None
    item = iterator.next()  # throws StopIteration if empty.
    for next in iterator:
        yield (prev,item,next)
        prev = item
        item = next
    yield (prev,item,None)

matches = [word for word in sentence.split() if word.endswith("GG")]
results = []

for prev, item, next in neighborhood(sentence.split()):
    for match in matches:
        if match == item:
            results.append((prev, item, next))

This returns:

[('BBBSDC', 'FEKGG', 'SDFGF'), ('SDFGF', 'KETGG', None)]

Marcelo Cantos · Answer 2 · 2013-08-11T11:10:44.430

Here's one possibility:

words = sentence.split()
[pos] = [i for (i, word) in enumerate(words) if word.endswith("GG") ]
behind = words[pos - 1]
infront = words[pos + 1]

You might need to take care with edge-cases, such as "…GG" not appearing, appearing more than once, or being the first and/or last word. As it stands, any of these will raise an exception, which may well be the correct behaviour.

A completely different solution using regexes avoids splitting the string into an array in the first place:

match = re.search(r'\b(\w+)\s+(?:\w+GG)\s+(\w+)\b', sentence)
(behind, infront) = m.groups()

score 1 · Answer 3 · answered Aug 11 '13 at 20:37

This is one way. The infront and behind elements will be None if the "GG" word is at the beginning or end of the sentence.

words = sentence.split()
[(infront, word, behind) for (infront, word, behind) in 
 zip([None] + words[:-1], words, words[1:] + [None])
 if word.endswith("GG")]

DevLounge · Answer 4 · 2013-08-11T22:25:08.990

1

sentence = 'AASFG BBBSDC FEKGG SDFGF AAABGG FOOO EEEGG'

def make_trigrams(l):
    l = [None] + l + [None]

    for i in range(len(l)-2):
        yield (l[i], l[i+1], l[i+2])


for result in [t for t in make_trigrams(sentence.split()) if t[1].endswith('GG')]:
    behind,match,infront = result

    print 'Behind:', behind
    print 'Match:', match
    print 'Infront:', infront, '\n'

Output:

Behind: BBBSDC
Match: FEKGG
Infront: SDFGF

Behind: SDFGF
Match: AAABGG
Infront: FOOO

Behind: FOOO
Match: EEEGG
Infront: None

edited Aug 11 '13 at 22:25

answered Aug 11 '13 at 21:32

DevLounge

8,313
3
31
44

This is what you were looking for, hopefully. – DevLounge Aug 11 '13 at 22:27

theodox · Answer 5 · 2013-08-11T23:10:11.540

1

another itertools based option, may be more memory friendly on large datasets

from itertools import tee, izip
def sentence_targets(sentence, endstring):
   before, target, after = tee(sentence.split(), 3)
   # offset the iterators....
   target.next()
   after.next()
   after.next()
   for trigram in izip(before, target, after):
       if trigram[1].endswith(endstring): yield trigram

EDIT: fixed typo

edited Aug 11 '13 at 23:10

answered Aug 11 '13 at 21:53

theodox

12,028
3
23
36

AttributeError: 'itertools.tee' object has no attribute 'endswith' – DevLounge Aug 11 '13 at 22:32

Find word infront and behind of a Python list

5 Answers5

Linked