0

I have a text as given:

(3) Reflects the adoption of SFAS No. 128, EARNINGS PER SHARE.

<PAGE>

ITEM 7. MANAGEMENT'S DISCUSSION AND ANALYSIS OF FINANCIAL CONDITION AND RESULTS
        OF OPERATION


YEAR ENDED DECEMBER 28, 1997 COMPARED TO THE YEAR ENDED DECEMBER 29, 1996


In November 1996, the Company initiated a major restructuring and growth
plan designed to substantially reduce its cost structure and grow the business
in order to restore higher levels of profitability for the Company. By July
1997, the Company completed the major phases of the restructuring plan. The
$225.0 million of annualized cost savings anticipated from the restructuring
results primarily from the consolidation of administrative functions within 

Here, I want to extract "MANAGEMENT'S DISCUSSION AND ANALYSIS" which occurs after <PAGE>. There are many other "MANAGEMENT'S DISCUSSION AND ANALYSIS" in the document ( I have not copied the document as it is 1000+ pages long).

I used the following Regex expression:

pattern = ('?<=<PAGE>')('.*')('?=Management\'s Discussion')

but it is giving this error

TypeError: 'str' object is not callable

What's wrong, where and how to rectify it?

MegaIng
  • 7,361
  • 1
  • 22
  • 35
  • try changing it to `pattern = r"<-PAGE>\n*.*\KMANAGEMENT'S DISCUSSION AND ANALYSIS"` – Matt.G May 21 '18 at 18:23
  • 1
    Done. No errors but not able to find anything. Maybe in the coding of the txt file, it is just not one '\n'. Any syntax to find <-PAGE> and MANAGEMENT'S DISCUSSION AND ANALYSIS with any number of characters between them? –  May 21 '18 at 18:28
  • \K is not working in python. Try `<-PAGE>\n*.*(MANAGEMENT'S DISCUSSION AND ANALYSIS)` and get group 1. See [Demo](https://regex101.com/r/mBtYEb/1) – Matt.G May 21 '18 at 18:32
  • `\n*.*` should take care of any number of `\n` followed by any number of characters – Matt.G May 21 '18 at 18:48
  • copy your text file [here](https://pastebin.com/) and share the link – Matt.G May 21 '18 at 18:55
  • Thanks. https://www.sec.gov/Archives/edgar/data/3662/00taully00950170-98-000413.txt Actually, the <-PAGE> IS , since was not visible here, so I used <-PAGE>. Also, there are multiple sections with MANAGEMENT'S DISCUSSION AND ANALYSIS but I want the one, not in the table of contents and dealing with it in ITEM 7. (No. may vary from file to file. ) –  May 21 '18 at 19:00
  • 1
    I'm getting access denied on that link – Matt.G May 21 '18 at 19:11
  • https://drive.google.com/file/d/1iUkWWYGjcjIepSUINCg70BBIcKUPWsAK/view?usp=sharing Please see this –  May 22 '18 at 08:29
  • See [Demo](https://regex101.com/r/jOAeJn/2) – Matt.G May 22 '18 at 12:21
  • Could you please give the python code for this expression and how to search it, I'm a little new to it. Thank you. –  May 22 '18 at 16:30
  • I won't be able to help you much here, as I'm not a python developer. [This](https://docs.python.org/3/library/re.html#match-objects) could be a good starting point – Matt.G May 22 '18 at 16:53
  • Can you tell me some from where to master regex, starting from the scratch? I'm very poor at it and a beginner. –  May 23 '18 at 07:13
  • SO has a very good FAQ on Regex. [Link](https://stackoverflow.com/a/22944075/9534819). – Matt.G May 23 '18 at 13:02

0 Answers0