Regex in perl, match newline AND first word of next line

Question

I have a file that looks like

title="title1"  
artist="artist1"  
title="title2"  
artist="artis2"  
title="title3"  
artist="artist3"

And so on

this command
perl -pe 's/title="(.*?)"\n//ig' list.txt

Is not working as I'd hope. If I do that alone, I get just the artist lines, BUT if I do this

perl -pe 's/title="(.*?)"\nartist//ig' list.txt

It doesn't match at all.
I've tried with and without the /g and tried with the addition of a /m I've look at the file in nano, and I don't see any additional characters between the final " in each line and the "artist" in the next.

Anyone know what I'm doing wrong? (I'm using perl rather than sed, because the regex that generates this list uses a negative lookahead).

My goal is to be able to use a line like below
perl -pe 's/title="(.*?)"\nartist="(.*?)"(?:\n|$)/\2 - \1/ig' list.txt

That would output something like

artist1 - title1  
artist2 - title2  
artist3 - title3

Open it in `vim` and issue the command `:set list` to see if you might have other unprinted characters in the way, _e.g._ Windows-style newlines, `\r\n`. — Andrew Cheong, Jan 25 '16 at 00:33

Borodin · Answer 1 · 2016-07-04T09:35:25.140

Your substitution

s/title="(.*?)"\n//ig

is replacing any line that looks like title="xxx" with nothing. It is deleting those lines.

It's unclear what you want, but if your requirement is to remove the title= and the quotes then you should use

perl -pe 's/title="(.*?)"/$1/i' myfile

The /g modifier is superfluous unless you expect many titles in a one line from the file

Update

If you want to pair titles with artists then you really need a script file. This should do what you need. The data is taken directly from your question

use strict;
use warnings 'all';
use feature 'say';

my $title;

while ( <DATA> ) {

    if ( /title="([^"]*)"/ ) {
        $title = $1;
    }
    elsif ( /artist="([^"]*)"/ ) {
        say "$1 - $title";
    }
}


__DATA__
title="title1"
artist="artist1"
title="title2"
artist="artis2"
title="title3"
artist="artist3"

output

artist1 - title1
artis2 - title2
artist3 - title3

dawg · Accepted Answer · 2016-01-25T01:20:38.043

2

For a "slurp" approach, you can use this regex:

(^title="([^"]+)")\s*\R(^artist="([^"]+)")\s*(?:\R|\z)

Demo

Then given your example:

$ echo "$art" 
title="title1"  
artist="artist1"  
title="title2"  
artist="artis2"  
title="title3"  
artist="artist3"

Just "slurp" the file with -0777 and print $2 and $4:

$ echo "$art" | perl -0777 -lne 'while (/(^title="([^"]+)")\s*\R(^artist="([^"]+)")\s*(?:\R|\z)/gm) { print "$4 - $2\n"}'
artist1 - title1
artis2 - title2
artist3 - title3

edited Jan 25 '16 at 01:20

answered Jan 25 '16 at 01:14

dawg

98,345
23
131
206

Slurp mode looks like it does what I want, and my original regex looks like it will work, I used this regex: 's/title="(.*?)"\nartist="(.*?)"(?:\n|$)/\2 - \1\n/ig' – Trel Jan 25 '16 at 08:09
(To add to that last comment, I didn't need to do the modification you did to negate the ", as I used ? on the .* to make it lazy. – Trel Jan 25 '16 at 08:11
Great. You may want to consider two modifications: 1) Use `\R` or `$` rather than `\n` in the regex. The `\R` is a metacharacter for any sequence of line endings (windows, etc) and 2) you may want to add `\h*` or `\s*` after the closing quotes to catch invisible trailing line endings like you had in your example. So like this: `^title="(.*?)"\h*\R^artist="(.*?)"\h*$` – dawg Jan 25 '16 at 14:22
*"I didn't need to do the modification you did to negate the `"`"* Non-greedy matches can be fickle, and I highly recommend that you stick with `"([^"]*)"`. There are many posts on Stack Overflow like [*Non-greedy regex acts greedily*](http://stackoverflow.com/questions/8971608/non-greedy-regex-acts-greedily) where people have misunderstood what a non-greedy match does – Borodin Jan 26 '16 at 17:15
@Borodin Normally I'd agree, but in my case, lazy matching works here since the file is in a specific format, and won't ever have a case that doesn't work in this manner since I'm generating the data it's processing. (Dawg, it wasn't from me, no idea he did since he's agreeing with your original comment) – Trel Jan 27 '16 at 16:12

Gene · Answer 3 · 2016-01-25T00:57:13.917

1

You never mentioned what you're trying to do. If you want to extract the titles and artists, you 'll want something like this:

our $s = q|
title="title1"
artist="artist1"
title="title2"
artist="artis2"
title="title3"
artist="artist3"
|;

my @matches = $s =~ /^title="(.*?)".*?^artist="(.*?)"/smg;

print join(';', @matches);

This prints

title1;artist1;title2;artis2;title3;artist3

edited Jan 25 '16 at 00:57

answered Jan 25 '16 at 00:45

Gene

46,253
4
58
96

Sorry fixed this a minute ago. Guess you can't see it yet. Just missed the end of the line when I copied the text. – Gene Jan 25 '16 at 00:58

Casimir et Hippolyte · Answer 4 · 2016-01-25T02:07:21.987

1

IF your file is exactly as you describe it, you can use this command that reads two lines at once. In this way you avoid the slurp mode:

perl -pe '$_.=<>;s/.*?"(.*?)".*?"(.*?)"/$2 - $1/s' file

if you need something more explicit, you can use:

perl -pe 'if (/^title="/){$_.=<>;s/^.*?"(.*?)"\h*\Rartist="(.*?)"\h*/$2 - $1/}' file

edited Jan 25 '16 at 02:07

answered Jan 25 '16 at 01:24

Casimir et Hippolyte

88,009
5
94
125

Regex in perl, match newline AND first word of next line

4 Answers4

Update

output