copy section of text in file python

Question

I need to extract values from the text file below:

fdsjhgjhg
fdshkjhk
Start
Good Morning
Hello World
End
dashjkhjk
dsfjkhk

The values I need to extract are from Start to End.

with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
    copy = False
    for line in infile:
        if line.strip() == "Start":
            copy = True
        elif line.strip() == "End":
            copy = False
        elif copy:
            outfile.write(line)

The code above I am using is from this question: Extract Values between two strings in a text file using python

This code will not include the strings "Start" and "End" just what is inside them. How would you include the perimeter strings?

I would use multiline RegExp for that - the code will also look much easier — MaxU - stand with Ukraine, Mar 02 '16 at 21:36

Dan H · Accepted Answer · 2016-03-04T04:04:32.440

@en_Knight has it almost right. Here's a fix to meet the OP's request that the delimiters ARE included in the output:

with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
    copy = False
    for line in infile:
        if line.strip() == "Start":
            copy = True
        if copy:
            outfile.write(line)
        # move this AFTER the "if copy"
        if line.strip() == "End":
            copy = False

OR simply include the write() in the case it applies to:

with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
    copy = False
    for line in infile:
        if line.strip() == "Start":
            outfile.write(line) # add this
            copy = True
        elif line.strip() == "End":
            outfile.write(line) # add this
            copy = False
        elif copy:
            outfile.write(line)

Update: to answer the question in the comment "only use the 1st occurance of 'End' after 'Start'", change the last elif line.strip() == "End" to:

        elif line.strip() == "End" and copy:
            outfile.write(line) # add this
            copy = False

This works if there is only ONE "Start" but multiple "End" lines... which sounds odd, but that is what the questioner asked.

That makes a lot of sense. Is it possible to be selective and end the copy only use the 1st occurance of 'End' after 'Start'. My file contains a number of strings 'End'? — johnnydrama, Mar 03 '16 at 13:43
@Dan H what if there is a `Start` after `End` how to prevent to copy this `Strat`? and stop copying immediately — Catalina, Feb 16 '20 at 01:17
@Catalina : options: 1) call exit() after you see "End". 2) count the number of starts you see; only set copy to "True" if this is the first one. — Dan H, Feb 16 '20 at 01:40

score 1 · Answer 2 · edited May 23 '17 at 12:07

The "elif" means "do this only if the other cases fail". It's syntactically equivalent to "else if", if you're coming from a differnet C-like language. Without it, the fall through should take care of including "Start" and "End"

with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
    copy = False
    for line in infile:
        if line.strip() == "Start":
            copy = True
        if copy: # flipped to include end, as Dan H pointed out
            outfile.write(line)
        if line.strip() == "End":
            copy = False

score 1 · Answer 3 · answered Mar 02 '16 at 21:44

1

RegExp approach:

import re

with open('input.txt') as f:
    data = f.read()

match = re.search(r'\n(Start\n.*?\nEnd)\n', data, re.M | re.S)
if match:
    with open('output.txt', 'w') as f:
        f.write(match.group(1))

answered Mar 02 '16 at 21:44

MaxU - stand with Ukraine

205,989
36
386
419

This is probably the more robust solution, but for someone who was unclear on elif v if, maybe you could include some textual description? – en_Knight Mar 02 '16 at 21:57
This is better: `(^Start[\s\S]+^End)` [Demo](https://regex101.com/r/gT0eR6/1) (Or `(^Start[\s\S]+?^End)` if there is more than 1 `End`...) – dawg Mar 02 '16 at 22:31

copy section of text in file python

3 Answers3

Linked