0

I need to extract two groups from a text like this:

Beginning | (Lorem Ipsum)
NextLine: Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem

The "Lorem Ipsum" (inside paranthesis) and the long text "Lorem Ipsum is simply dummy text...". This is the regex wich works fine on https://regex101.com/ :

((?<=\|\s\().*(?=\))).*\n.*((?<=NextLine:\s).*$)

But this regex doesn't work in perl (executed in CLI) returns nothing.

This is the perl command is getting executed in CLI (with gm options):

cat File | perl -ne '/((?<=\|\s\().*(?=\))).*\n.*((?<=NextLine:\s).*$)/gm' && print $1 . ", " . $2'
DummyBeginner
  • 411
  • 10
  • 34
  • 1
    You need to slurp the whole file (not read it line by line) since your regex is trying to match multiple lines. Try `perl -0777 -ne ' .... '` – Håkon Hægland Apr 21 '20 at 08:30
  • Thanks @HåkonHægland. it solved that. – DummyBeginner Apr 21 '20 at 08:44
  • @HåkonHægland However, it only returns the first occurrence in the file and doesn't go further the next lines? – DummyBeginner Apr 21 '20 at 08:52
  • 1
    Since you used the `g` (global) modifier to the regex, it should return all matches, but it will be in an array ( not `$1` and `$2`) unless you use a while loop – Håkon Hægland Apr 21 '20 at 08:54
  • 1
    Try `perl -0777 -ne 'while (/((?<=\|\s\().*(?=\))).*?\n.*?(?<=NextLine:\s)(.*?)$/gm) { print $1 . ", " . $2}' File` to print all the matches – Håkon Hægland Apr 21 '20 at 08:57
  • @HåkonHægland Thanks a lot. it works like a charm. However, I was supposing the -n flag would handle the loop aspect? I had a file with contents like `first Line\n DNS Query for "host" was successful | (Lorem Ipsum)` repeating, and all the occurrences of **host** and **Lorem Ipsum** was getting extracted simply by this Perl command (without `gm` option and while loop?!) : `perl -n -e'/((?<=DNS Query for ").*?(?=")).*?((?<=\().*(?=\)))/ && print $1 . ", " . $2 . "\n"'` – DummyBeginner Apr 21 '20 at 09:15

0 Answers0