3

I have a file that looks like

title="title1"  
artist="artist1"  
title="title2"  
artist="artis2"  
title="title3"  
artist="artist3"

And so on

this command
perl -pe 's/title="(.*?)"\n//ig' list.txt

Is not working as I'd hope. If I do that alone, I get just the artist lines, BUT if I do this

perl -pe 's/title="(.*?)"\nartist//ig' list.txt

It doesn't match at all.
I've tried with and without the /g and tried with the addition of a /m I've look at the file in nano, and I don't see any additional characters between the final " in each line and the "artist" in the next.

Anyone know what I'm doing wrong? (I'm using perl rather than sed, because the regex that generates this list uses a negative lookahead).

My goal is to be able to use a line like below
perl -pe 's/title="(.*?)"\nartist="(.*?)"(?:\n|$)/\2 - \1/ig' list.txt

That would output something like

artist1 - title1  
artist2 - title2  
artist3 - title3
Trel
  • 323
  • 4
  • 15

4 Answers4

3

Your substitution

s/title="(.*?)"\n//ig

is replacing any line that looks like title="xxx" with nothing. It is deleting those lines.

It's unclear what you want, but if your requirement is to remove the title= and the quotes then you should use

perl -pe 's/title="(.*?)"/$1/i' myfile

The /g modifier is superfluous unless you expect many titles in a one line from the file



Update

If you want to pair titles with artists then you really need a script file. This should do what you need. The data is taken directly from your question

use strict;
use warnings 'all';
use feature 'say';

my $title;

while ( <DATA> ) {

    if ( /title="([^"]*)"/ ) {
        $title = $1;
    }
    elsif ( /artist="([^"]*)"/ ) {
        say "$1 - $title";
    }
}


__DATA__
title="title1"
artist="artist1"
title="title2"
artist="artis2"
title="title3"
artist="artist3"

output

artist1 - title1
artis2 - title2
artist3 - title3
Borodin
  • 126,100
  • 9
  • 70
  • 144
2

For a "slurp" approach, you can use this regex:

(^title="([^"]+)")\s*\R(^artist="([^"]+)")\s*(?:\R|\z)

Demo

Then given your example:

$ echo "$art" 
title="title1"  
artist="artist1"  
title="title2"  
artist="artis2"  
title="title3"  
artist="artist3"

Just "slurp" the file with -0777 and print $2 and $4:

$ echo "$art" | perl -0777 -lne 'while (/(^title="([^"]+)")\s*\R(^artist="([^"]+)")\s*(?:\R|\z)/gm) { print "$4 - $2\n"}'
artist1 - title1
artis2 - title2
artist3 - title3
dawg
  • 98,345
  • 23
  • 131
  • 206
  • Slurp mode looks like it does what I want, and my original regex looks like it will work, I used this regex: 's/title="(.*?)"\nartist="(.*?)"(?:\n|$)/\2 - \1\n/ig' – Trel Jan 25 '16 at 08:09
  • (To add to that last comment, I didn't need to do the modification you did to negate the ", as I used ? on the .* to make it lazy. – Trel Jan 25 '16 at 08:11
  • Great. You may want to consider two modifications: 1) Use `\R` or `$` rather than `\n` in the regex. The `\R` is a metacharacter for any sequence of line endings (windows, etc) and 2) you may want to add `\h*` or `\s*` after the closing quotes to catch invisible trailing line endings like you had in your example. So like this: `^title="(.*?)"\h*\R^artist="(.*?)"\h*$` – dawg Jan 25 '16 at 14:22
  • *"I didn't need to do the modification you did to negate the `"`"* Non-greedy matches can be fickle, and I highly recommend that you stick with `"([^"]*)"`. There are many posts on Stack Overflow like [*Non-greedy regex acts greedily*](http://stackoverflow.com/questions/8971608/non-greedy-regex-acts-greedily) where people have misunderstood what a non-greedy match does – Borodin Jan 26 '16 at 17:15
  • @Borodin Normally I'd agree, but in my case, lazy matching works here since the file is in a specific format, and won't ever have a case that doesn't work in this manner since I'm generating the data it's processing. (Dawg, it wasn't from me, no idea he did since he's agreeing with your original comment) – Trel Jan 27 '16 at 16:12
1

You never mentioned what you're trying to do. If you want to extract the titles and artists, you 'll want something like this:

our $s = q|
title="title1"
artist="artist1"
title="title2"
artist="artis2"
title="title3"
artist="artist3"
|;

my @matches = $s =~ /^title="(.*?)".*?^artist="(.*?)"/smg;

print join(';', @matches);

This prints

title1;artist1;title2;artis2;title3;artist3
Gene
  • 46,253
  • 4
  • 58
  • 96
  • Sorry fixed this a minute ago. Guess you can't see it yet. Just missed the end of the line when I copied the text. – Gene Jan 25 '16 at 00:58
1

IF your file is exactly as you describe it, you can use this command that reads two lines at once. In this way you avoid the slurp mode:

perl -pe '$_.=<>;s/.*?"(.*?)".*?"(.*?)"/$2 - $1/s' file

if you need something more explicit, you can use:

perl -pe 'if (/^title="/){$_.=<>;s/^.*?"(.*?)"\h*\Rartist="(.*?)"\h*/$2 - $1/}' file
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125