perl script to copy file content which is between certain lines

Question

I am new to perl scripting and need help regarding a given problem. I have many files with details of persons. I want to print the contents from each of the file after a particular line and before a particular line. Example: (one of the file contains following details:)

My name is XYZ.
Address: ***
ID:12414
Country:USA
End XYZ

Another file contains details like:

My name is ABC.
Address: ###
ID:124344
Country:Singapore
End ABC

I want to print the lines from the first file after My name is XYZ and before End XYZ into my new file. Similarly, I want to print the contents from the second file after My name is ABC and before End ABC, into my new file.

I wrote the logic as below, but I am not sure of the perl syntax to print after and below a particular line.

while(<file1>)
{
    if () # if we read the phrase "My name" in file1 start printing after this     +line
    {
        print  #print the contents into file3(output file)
        if() # if we read the phrase "End" in file1 stop printing the content into     +file3
    }
}

I hope my question is clear. Any help is appreciated.

@ysth In your pattern `1` means line number then why you used `^` and `$`? — mkHun, Aug 17 '16 at 04:24
@mkHun no, it is matching against the return of `..`; see http://perldoc.perl.org/perlop.html#Range-Operators and http://perlmonks.org/?node_id=525406 — ysth, Aug 17 '16 at 05:48
Does anybody know why _all_ answers -- even the accepted one -- were downvoted? They all got positive feedback in the comments. I know that everyone can vote to his liking but this doesn't seem justified. — PerlDuck, Aug 18 '16 at 09:49

score -1 · Answer 1 · edited May 23 '17 at 12:08

-1

You can get the lines between My name is <name>. and End <name> with one of several regexes.

Lazy:

My name is ([^\n]+)\.(.*?)End \1

Greedy:

My name is ([^\n]+)\.(.*)End \1

Optimized:

My name is ([^\s]+)\.((?:[^\n]*(?!End \1)\n)+)End \1

Either way, you'll need the s modifier. If more than one thing needs to be parsed in a file, you will need the g modifier.

The back-references ensure a match without needing to know the name. This means that the content you want will be in capture group 2.

What's the difference between the three regexes? Speed! Depending on how many files you need to parse, you may need the speed.

The optimized one is the best if there is significant variance in what you are parsing. It works the same way as this other regex I wrote. (You should do some testing if speed is important.)

It should be fairly straight forward to write the code from this.

edited May 23 '17 at 12:08

Community

1
1

answered Aug 17 '16 at 03:13

Laurel

5,965
14
31
57

Thanks for the help. I think in your pattern 1 means the line number. What if we had lines prior to "My name ..." , and we don't know the line number of "My name..". How would the regex differ in that case. Also, if there are lines in between "My name..." and "End..." which we want to omit out by filtering them our based on our requirement, how would the regex be implement in this case. I think this is complex. Appreciate your help. – V. Tej Aug 17 '16 at 05:35
@V.Tej I have tested these and they work, even is your file has lines before `My name`. The `\1` does not refer to line number; it is a backreference, as I said. It would be easy to modify the regex to filter out lines. – Laurel Aug 17 '16 at 14:56

Steven F Kohler · Accepted Answer · 2016-08-18T01:09:50.383

-1

OK. I believe your question is about the perl syntax to print to the output file. I will try to give you a little more complete solution based on the description of what you are trying to do. This is just a quick very simple code example. (For somre reference you may want to also look at http://perlmaven.com/slurp.)

First lets call your new file "newfile.txt". Then lets call you source file(s) "sourcefile.txt". Here is some code with comments:

# First I would set the buffer to flush everything to to newfile.txt  
$++;

# Now open newfile.txt for writing the intformation you want
open my $NEWFILE, '>', 'newfile.txt';

# Now open sourcerfile.txt (or iterate over a list of them)
open my $SOURCEFILE, '<', 'sourcefile.txt';

# Now go through the sourcefile and get info you want to 
# add to your newfile

# set a variable to print data to newfile - initialize to
# N or false
$data_wanted = "N";

# open sourcefile and start reading lines

while <$SOURCEFILE> {
      # Test to see if data is between My Name and 
      if ($_ =~ /^My name/ ) {
          $data_wanted = "N";
      } 
      elsif ($_ =~ /^End/ ) {
          $data_wanted = "N";
          next;
      } 
      elsif ($_ =~ /^STUFF TO OMIT/) {
          $data_wanted = "N";
      }
      else {
          $data_wanted = "Y";
      }

      if ( $data_wanted eq "Y" ) {
          print $NEWFILE $_;
      }

      # you don't really need this but
      # it will show you how this works in perl
      next;  

}  # end of while

# finish by closing the files

close $SOURCEFILE;
close $NEWFILE;

##########################################

Hope this helps ;-)

edited Aug 18 '16 at 01:09

answered Aug 17 '16 at 03:27

Steven F Kohler

49
4

Thanks for the logic. It helps me to an extent. Just to add on to your logic, could you write another case in your while loop to omit the lines of code from the sourcefile.txt before the line containing "My name" – V. Tej Aug 17 '16 at 05:23
Also, if there are lines in between "My name..." and "End..." which we want to filter. How can we omit them out while copying the contents from SOURCEFILE to NEWFILE. Thanks for your help in advance. – V. Tej Aug 17 '16 at 05:37
For additional line filtering, you can additional "elsif" statement to my answer below. Hope it helps. – Lye Heng Foo Aug 17 '16 at 10:02
I am busyright now. I will make the mod to the code in approx 8 hours from now. – Steven F Kohler Aug 17 '16 at 15:07
ok Sure. Thanks for the help. Also, when you edit it please consider the case if we had statements in the text file after "End XYZ" and we just want to copy the contents until before the "End XYZ" line. I tried using "last" keyword, to break the loop, but it didn't work. – V. Tej Aug 17 '16 at 21:16
@Steven. Thanks for the code edit. I understand the logic. – V. Tej Aug 18 '16 at 18:24

score -1 · Answer 3 · answered Aug 17 '16 at 09:57

-1

Is this what you are looking for?

while (<>) {
    if ( /^My name / .. /^End / ) {
        if ( /^My name / ) {
            # Do nothing, or anything you would like for this line.
        } elsif ( /^End / ) {
            # Do nothing, or anything you would like for this line.
        } else {
           print $_;
        }
    }
}

answered Aug 17 '16 at 09:57

Lye Heng Foo

1,779
1
10
8

Yes, the logic looks similar to the code you wrote. What do the .. in the second line of your code indicate. Does it mean, any line between "My name" and "End". – V. Tej Aug 17 '16 at 21:10
Yes, ".." means all lines between matching "My name" and "End". – Lye Heng Foo Aug 18 '16 at 02:17
For additional line filtering (as requested in the comment above), you can additional "elsif" statement to the inner if conditions. Please let me know if you need further help. – Lye Heng Foo Aug 18 '16 at 02:22
Thanks for the help. The edited code above is clear enough. – V. Tej Aug 18 '16 at 18:26

perl script to copy file content which is between certain lines

3 Answers3