1

I am text mining a large document. I want to extract a specific line.

CONTINUED ON NEXT PAGE   CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 4 OF 16 PAGES  

SPE2DH-20-T-0133   SECTION B  

PR: 0081939954   NSN/MATERIAL: 6530015627381

ITEM DESCRIPTION

BOTTLE, SAFETY CAP

BOTTLE, SAFETY CAP   RPOO1: DLA PACKAGING REQUIREMENTS FOR PROCUREMENT

RAQO1: THIS DOCUMENT INCORPORATES TECHNICAL AND/OR QUALITY REQUIREMENTS (IDENTIFIED BY AN 'R' OR AN 'I' NUMBER) SET FORTH IN FULL TEXT IN THE DLA MASTER LIST OF TECHNICAL AND QUALITY REQUIREMENTS FOUND ON THE WEB AT:

I want to extract the description immediately under ITEM DESCRIPTION.

I have tried many unsuccessful attempts.

My latest attempt was:

for line in text:
    if 'ITEM' and 'DESCRIPTION'in line:
        print ('Possibe Descript:\n', line)

But it did not find the text.

Is there a way to find ITEM DESCRIPTION and get the line after it or something similar?

Georgy
  • 12,464
  • 7
  • 65
  • 73
e.iluf
  • 1,389
  • 5
  • 27
  • 69
  • Can you please show us some more code, how you open your file for instance, and also the expected output? – Silveris Oct 11 '19 at 13:14
  • Possible duplicate of [Python - Extracting next line of text file](https://stackoverflow.com/questions/49606909/python-extracting-next-line-of-text-file) – Georgy Oct 11 '19 at 13:23
  • Also, see here: [Check if multiple strings exist in another string](https://stackoverflow.com/q/3389574/7851470) – Georgy Oct 11 '19 at 13:24

5 Answers5

1

The following function finds the description on the line below some given pattern, e.g. "ITEM DESCRIPTION", and also ignores any blank lines that may be present in between. However, beware that the function does not handle the special case when the pattern exists, but the description does not.

txt = '''
CONTINUED ON NEXT PAGE CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED:    PAGE 4 OF 16 PAGES

SPE2DH-20-T-0133 SECTION B

PR: 0081939954 NSN/MATERIAL: 6530015627381

ITEM DESCRIPTION

BOTTLE, SAFETY CAP

BOTTLE, SAFETY CAP RPOO1: DLA PACKAGING REQUIREMENTS FOR PROCUREMENT

RAQO1: THIS DOCUMENT INCORPORATES TECHNICAL AND/OR QUALITY REQUIREMENTS (IDENTIFIED BY AN 'R' OR AN 'I' NUMBER) SET FORTH IN FULL TEXT IN THE DLA MASTER LIST OF TECHNICAL AND QUALITY REQUIREMENTS FOUND ON THE WEB AT:
'''

I've assumed you got your text as a text string, and thus the function below will split it into a list of lines ..

pattern = "ITEM DESCRIPTION" # to search for

def find_pattern_in_txt(txt, pattern):
    lines = [line for line in txt.split("\n") if line] # remove empty lines
    if pattern in lines: return lines[lines.index(pattern)+1]
    return None

print(find_pattern_in_txt(txt, pattern)) # prints: "BOTTLE, SAFETY CAP"
Wololo
  • 1,249
  • 1
  • 13
  • 25
0

Test like this :

description = False
for line in text:
    if 'ITEM DESCRIPTION' in line:
        description = True
    if description:
        print(line)

Know this will work but you need something to stop reading the description, maybe another title like this

description = False
for line in text:
    if 'ITEM DESCRIPTION' in line:
        description = True
    if description:
        print(line)
    if "END OF SOMETHING":
        description = False
Florian Bernard
  • 2,561
  • 1
  • 9
  • 22
0

Use the string function 'find' as in the following, 'find' will return the index of the string you are looking for, so a positive number shows that you have found it.

code:


txt = "Hello, welcome to my world."
x = txt.find("welcome")
if x > 0:  
    print(x)

***
output:
***
7
Joe McKenna
  • 135
  • 5
0
f=open("aa.txt","r")

a=[]

for i in f:

  a.append(i.split())

t1=0

for j in range(len(a)):

   for i in range(len(a[j])):

       if(a[j][i]=="ITEM" and a[j][i+1]=="DESCRIPTION"):

           t1=j

for i in range(t1+1,len(a)):

    for j in range(len(a[i])):

        print(a[i][j]),
Georgy
  • 12,464
  • 7
  • 65
  • 73
0

Use regex

import re
pattern = re.compile("(ITEM DESCRIPTION)\n.*") #if the information is directly 
below without white space
pattern = re.compile("(ITEM DESCRIPTION)\n\n.*") #if there is a white space 
before the information


for i, line in enumerate(open('file.txt')):
    for match in re.finditer(pattern, line):
        print 'Found on line %s: %s' % (i+1, match.group())
Bram
  • 33
  • 7