0

I have big text file which has lot of text information but I would like to extract the text between two defined text. e.g

    /begin MEASUREMENT XYZ
        UBYTE
        _CNV_A_R_LINEAR_____71_CM
        1
        100.
        -40.
        160.
        FORMAT "%3.0"
        SYMBOL_LINK "XYZ" 0
/begin IF_DATA EVTRKMNBXERTBK 
    DEFAULT_RASTERS 3 3
/end IF_DATA 
    /end MEASUREMENT

i.e /begin MEASUREMENT and /end MEASUREMENT in between this I want to extract text.

My code is:

import re
path = r"d:\xyz.txt"
file = open(path, 'r')
lines = file.read()
pattern = re.compile(r'begin MEASUREMENT[\s][\w+](.*?)end MEASUREMENT')
print re.findall(pattern, lines)
mkHun
  • 5,891
  • 8
  • 38
  • 85
user2030113
  • 475
  • 1
  • 5
  • 14

2 Answers2

1

Use (?s), this is consider multiple line as a single line. So dot match all characters including newlines.

pattern = re.compile(r'(?s)begin MEASUREMENT[\s](.*?)end MEASUREMENT')

So try this,

import re
path = "py.txt"
file = open(path, 'r')
lines = file.read()
pattern = re.compile(r'(?s)begin MEASUREMENT[\s](.*?)end MEASUREMENT')
result = re.findall(pattern, lines)
print result[0]

EDITED

t = "XYZ"
pattern = re.compile(r'(?s)begin MEASUREMENT\s+((%s).*?)end MEASUREMENT'%t)
mkHun
  • 5,891
  • 8
  • 38
  • 85
  • @user2030113 This is works fine for me. What is the result it gives? – mkHun Apr 22 '16 at 05:21
  • your solution worked for me. i just want to get more specific searching so i want to include "XYZ' in searching. As these names will be different in file. – user2030113 Apr 22 '16 at 05:25
  • I tried this and now it is working. >>> t = "xyz" >>> pattern = re.compile(r'(?s)begin MEASUREMENT[\s]%s(.*?)end MEASUREMENT' %t) – user2030113 Apr 22 '16 at 05:38
0

Try this:

text ="""
    /begin MEASUREMENT XYZ
        UBYTE
        _CNV_A_R_LINEAR_____71_CM
        1
        100.
        -40.
        160.
        FORMAT "%3.0"
        SYMBOL_LINK "XYZ" 0
/begin IF_DATA EVTRKMNBXERTBK 
    DEFAULT_RASTERS 3 3
/end IF_DATA 
    /end MEASUREMENT"""

print text.split("/begin MEASUREMENT")[1].split("/end MEASUREMENT")[0]
tfv
  • 6,016
  • 4
  • 36
  • 67