1

I have a long list of strings which contain substrings of interest in the order they are given, but here is a small example using sentences in a text file:

This is a long drawn out sentence needed to emphasize a topic I am trying to learn.
It is new idea for me and I need your help with it please!
Thank you so much in advance, I really appreciate it.

From this text file, I would like to find any sentences that contain both "I" and "need" but they must occur in that order.

So in this example, 'I' and 'need' both occur in sentence 1 and sentence 2 but in sentence 1 they are in the wrong order, so I do not want to return that. I only want to return the second sentence, as it has 'I need' in order.

I have used this example to identify the substrings, but I cannot figure out how to only find them in order:

id1 = "I"
id2 = "need"

with open('fun.txt') as f:
    for line in f:
        if id1 and id2 in line:
            print(line[:-1])

This returns:

This is a long drawn out sentence needed to emphasize a topic I am trying to learn.
It is new idea for me and I need your help with it please!

But I want only:

It is new idea for me and I need your help with it please!

Thanks!

  • check my answer here https://stackoverflow.com/a/53890918/4046632 Same applies for `if id1 and id2 in line:`. – buran Dec 21 '18 at 21:43

4 Answers4

1

You need to identify id2 in the portion of the line after id1:

infile = [
    "This is a long drawn out sentence needed to emphasize a topic I am trying to learn.",
    "It is new idea for me and I need your help with it please!",
    "Thank you so much in advance, I really appreciate it.",
]

id1 = "I"
id2 = "need"

for line in infile:
    if id1 in line:
        pos1 = line.index(id1)
        if id2 in line[pos1+len(id1) :] :
            print(line)

Output:

It is new idea for me and I need your help with it please!
Prune
  • 76,765
  • 14
  • 60
  • 81
1

You can use a regular expression to check for this. One possible solution is this:

id1 = "I"
id2 = "need"
regex = re.compile(r'^.*{}.*{}.*$'.format(id1, id2))

with open('fun.txt') as f:
    for line in f:
        if re.search(regex, line):
            print(line[:-1])
Felix
  • 1,837
  • 9
  • 26
  • 1
    nice! an improvement would be to use `regex = re.compile(r'{}.*{}'.format(id1, id2))` and `regex.search(line)` – ic3b3rg Dec 21 '18 at 21:55
0

Just do

  import re
  match = re.match('pattern','yourString' )

https://developers.google.com/edu/python/regular-expressions

So the pattern you are looking for is 'I(.*)need' Regex Match all characters between two strings You may have to construct your pattern differently as I don't know if there are exceptions. If so, you can run regex twice to get a subset of your original string, and again to get the exact match you want

melp
  • 64
  • 4
  • 1
    This is a general use case, but it is not a single pattern. If `'I'` is the first word in the sentence and `'need'` is the last word such as `'I have everything we need.'` I still want to return this sentence. –  Dec 21 '18 at 21:48
  • Using regex is certainly a good idea here, but your answer does not satisfy the question. Both the pattern `I\sneed` and the code `re.compile('pattern','yourString' )` are wrong. Please improve, or delete. – FabienP Dec 21 '18 at 21:49
0

You can define a function that computes the intersection of the two sets (each of the sentences and I need), and use sorted with a key that sorts the result in the same order of appearance that in the sentence. That way you check if the resulting list's order matches the one in I need:

a = ['I','need']
l = ['This is a long drawn out sentence needed to emphasize a topic I am trying to learn.',
'It is new idea for me and I need your help with it please!',
'Thank you so much in advance, I really appreciate it.']

Self defined function. Returns True if the strings are contained in the same order:

def same_order(l1, l2):
    inters = sorted(set(l1) & set(l2.split(' ')), key = l2.split(' ').index)
    return True if inters == l1 else False

Returns a given string in the list l if True is returned:

[l[i] for i, j in enumerate(l) if same_order(a, j)]
#['It is new idea for me and I need your help with it please!']
yatu
  • 86,083
  • 12
  • 84
  • 139