1

I am new to Perl and trying to use Regex to get a piece of string between two tags that I know will be there in that string. I already tried various answers from stackoverflow but none of them seems to be working for me. Here's my example...

The required data is in $info variable out of which I want to get the useful data

my $info = "random text i do not want\n|BIRTH PLACE=Boston, MA\n|more unwanted random text";

The Useful Data in the above string is Boston, MA. I removed the newlines from the string by $info =~ s/\n//g;. Now $info has this string "random text i do not want|BIRTH PLACE=Boston, MA|more unwanted random text". I thought doing this will help me capture the required data easily.

Please help me in getting the required data. I am sure that the data will always be preceded by |BIRTH PLACE= and succeeded by |. Everything before and after that is unwanted text. If a question like this is already answered please guide me to it as well. Thanks.

SagarG
  • 35
  • 1
  • 8

4 Answers4

3

Instead of replacing everything around it, you could search for /\|BIRTH PLACE=([^\|]+)\n\|/, [^\|]+ being one or more of anything that is not a pipe.

Flo
  • 1,965
  • 11
  • 18
2
$info =~ m{\|BIRTH PLACE=(.*?)\|} or die "There is no data in \$info?!";
my $birth_place = $1;

That should do the trick.

1

You know, actually, those newlines might have helped you. I would have gone for an initial regular expression of:

/^\|BIRTH PLACE=(.*)$/m

Using the multiline modifer (m) to match ^ at the beginning of a line and $ at the end of it, instead of just matching at the beginning and end of the string. Heck, you can even get really crazy and match:

/(?<=^\|BIRTH PLACE=).+$/m

To capture only the information you want, using lookbehind ((?<= ... )) to assert that it's the birth place information.

Why curse the string twice when you can do it once?

So, in perl:

if ($info =~ m/(?<=^\|BIRTH PLACE=).+$/m) {
    print "Born in $&.\n";
} else {
    print "From parts unknown";
}
FrankieTheKneeMan
  • 6,645
  • 2
  • 26
  • 37
1

You have presumably read this data from a file, which is a bad start. You program should look like this

use strict;
use warnings;

use autodie;

open my $fh, '<', 'myfile';

my $pob;
while (<$fh>) {
  if (/BIRTH PLACE=(.+)/) {
    $pob = $1;
    last;
  }
}

print $pob;

output

Boston, MA
Borodin
  • 126,100
  • 9
  • 70
  • 144