1

I need to extract values from the text file below:

fdsjhgjhg
fdshkjhk
Start
Good Morning
Hello World
End
dashjkhjk
dsfjkhk

The values I need to extract are from Start to End.

with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
    copy = False
    for line in infile:
        if line.strip() == "Start":
            copy = True
        elif line.strip() == "End":
            copy = False
        elif copy:
            outfile.write(line)

The code above I am using is from this question: Extract Values between two strings in a text file using python

This code will not include the strings "Start" and "End" just what is inside them. How would you include the perimeter strings?

Community
  • 1
  • 1
johnnydrama
  • 184
  • 1
  • 12

3 Answers3

3

@en_Knight has it almost right. Here's a fix to meet the OP's request that the delimiters ARE included in the output:

with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
    copy = False
    for line in infile:
        if line.strip() == "Start":
            copy = True
        if copy:
            outfile.write(line)
        # move this AFTER the "if copy"
        if line.strip() == "End":
            copy = False

OR simply include the write() in the case it applies to:

with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
    copy = False
    for line in infile:
        if line.strip() == "Start":
            outfile.write(line) # add this
            copy = True
        elif line.strip() == "End":
            outfile.write(line) # add this
            copy = False
        elif copy:
            outfile.write(line)

Update: to answer the question in the comment "only use the 1st occurance of 'End' after 'Start'", change the last elif line.strip() == "End" to:

        elif line.strip() == "End" and copy:
            outfile.write(line) # add this
            copy = False

This works if there is only ONE "Start" but multiple "End" lines... which sounds odd, but that is what the questioner asked.

Dan H
  • 14,044
  • 6
  • 39
  • 32
  • That makes a lot of sense. Is it possible to be selective and end the copy only use the 1st occurance of 'End' after 'Start'. My file contains a number of strings 'End'? – johnnydrama Mar 03 '16 at 13:43
  • @Dan H what if there is a `Start` after `End` how to prevent to copy this `Strat`? and stop copying immediately – Catalina Feb 16 '20 at 01:17
  • @Catalina : options: 1) call exit() after you see "End". 2) count the number of starts you see; only set copy to "True" if this is the first one. – Dan H Feb 16 '20 at 01:40
1

The "elif" means "do this only if the other cases fail". It's syntactically equivalent to "else if", if you're coming from a differnet C-like language. Without it, the fall through should take care of including "Start" and "End"

with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
    copy = False
    for line in infile:
        if line.strip() == "Start":
            copy = True
        if copy: # flipped to include end, as Dan H pointed out
            outfile.write(line)
        if line.strip() == "End":
            copy = False
Community
  • 1
  • 1
en_Knight
  • 5,301
  • 2
  • 26
  • 46
1

RegExp approach:

import re

with open('input.txt') as f:
    data = f.read()

match = re.search(r'\n(Start\n.*?\nEnd)\n', data, re.M | re.S)
if match:
    with open('output.txt', 'w') as f:
        f.write(match.group(1))
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • This is probably the more robust solution, but for someone who was unclear on elif v if, maybe you could include some textual description? – en_Knight Mar 02 '16 at 21:57
  • This is better: `(^Start[\s\S]+^End)` [Demo](https://regex101.com/r/gT0eR6/1) (Or `(^Start[\s\S]+?^End)` if there is more than 1 `End`...) – dawg Mar 02 '16 at 22:31