how to detect the third line before certain line

Question

I have a file, like this:

<prop type="ltattr-match">1-1</prop>
id =>3</prop>
<tuv xml:lang="en">
<seg> He is not a good man </seg>

And what I want is to detect the third line before the line He is not a good man , i.e (id =>3). The file is big. what I can do

The THIRD line before it or the SECOND line before it? – Hyperboreus Apr 25 '14 at 15:55 — Hyperboreus, Apr 25 '14 at 15:55

Jasper · Answer 1 · 2014-04-25T16:37:33.903

I suggest using a double ended queue with a maximum length: this way, only the required amount of "backlog" is stored and you don't have to fiddle around with slices manually. We don't need the "double-ended-ness", but the normal Queue class blocks if the queue is full.

import collections
dq = collections.deque([], 3)        # create an empty queue

with open("mybigfile.txt") as file:
    for line in file.readlines():
        if line.startswith('<seg>'):
            return dq[0]             # or add to list
        dq.append(line)              # save the line, if already 3 lines stored,
                                     # discard oldest line.

Scott Hunter · Answer 2 · 2014-04-25T15:53:03.957

1

Read each line in sequence, remembering only the last 3 read at any point.

Something like:

# Assume f is a file object open to your file
last3 = []
last3.append( f.readline() )
last3.append( f.readline() )
last3.append( f.readline() )
while ( True ):
    line = f.readline()
    if (line satisfies condition):
        break
    last3 = last3[1:]+[line]
# At this point last3[0] is 3 lines before the matching line

You'll need to modify this to handle files w/ < 3 lines, or if no line matches your condition.

edited Apr 25 '14 at 15:53

answered Apr 25 '14 at 15:42

Scott Hunter

48,888
12
60
101

Scott Hunter@ Can you please formulate this in a code? – sss Apr 25 '14 at 15:44

score 1 · Accepted Answer · edited Jan 18 '21 at 12:35

1

with open("mybigfile.txt") as file:
    lines = file.readlines()

for idx, line in enumerate(lines):
    if line.startswith("<seg>"):
        line_to_detect = lines[idx-3]
        #use idx-2 if you want the _second_ line before this one, 
        #ex `id =>3</prop>`
        print "This line was detected:"
        print line_to_detect

Result:

This line was detected:
<prop type="ltattr-match">1-1</prop>

As we previously discussed in chat, this method can be memory intensive for very large files. But 100 pages isn't very large, so this should be fine.

edited Jan 18 '21 at 12:35

Community

1
1

answered Apr 25 '14 at 15:55

Kevin

74,910
12
133
166

Where did you get the "100 pages" from? – Scott Hunter Apr 25 '14 at 15:59
From chat, [here](http://chat.stackoverflow.com/transcript/message/16074775#16074775) specifically. – Kevin Apr 25 '14 at 16:13

llrs · Answer 4 · 2014-04-25T15:58:11.850

0

file = "path/to/the/file"
f = open(file, "r")
lines = f.readlines()
f.close()
i = 0
for line in lines:
    if "<seg> He is not a good man </seg>" in line:
       print(lines[i]) #Print the prvious line
    else
        i += 1

If you need the second line before just change to print(lines[i-1])

edited Apr 25 '14 at 15:58

answered Apr 25 '14 at 15:44

llrs

3,308
35
68

This will have `line` be empty. It certainly won't be the third line, much less "the third line before the line He is not a good man". – DSM Apr 25 '14 at 15:46
And OP isn't looking for the 3rd line, but the 3rd line BEFORE one with specific content. – Scott Hunter Apr 25 '14 at 15:47
@DSM I thought that you don't need to use the variable you loop.Now it will check the next line and find if it happens more than once. – llrs Apr 25 '14 at 15:56

how to detect the third line before certain line

4 Answers4