Perl regex capture groups and reshuffle pattern

Question

I use perl regex capture groups to replace the pattern of a large number of files.

File example 1:

title="alpha" lorem ipsum lorem ipsum name="beta"

File example 2:

title="omega" Morbi posuere metus purus name="delta"

for

title="beta" lorem ipsum lorem ipsum

title="delta" Morbi posuere metus purus

using

find . -type f -exec perl -pi -w -e 's/title="(?'one'.*?)"(?'three'.*?)name="(?'two'.*?)"/title="\g{two}"\g{three}/g;' \{\} \;

(Note that (1) attribute values of title and name are unknown variables and (2) the content between title="alpha" and name="beta" differs. )

I am still learning perl regex. What am I doing wrong? .

Look at the color coding in your post. You've confused StackOverflow (and your shell) with all of the quotation marks that don't nest. Once things get this complicated, I recommend writing out a Perl script as a file so you don't have to deal with this kind of shenanigans. — Silvio Mayolo, Jul 29 '22 at 16:53
I'm puzzled... this question is identical to https://stackoverflow.com/questions/73166757/perl-regex-capture-groups/73170354#73170354 which was posted 4 hours _before_ this one, but the older question is closed in favor of the copy? — Erwin, Jul 29 '22 at 21:42
@Erwin, thank you for noticing that, I thought other one was opened later(it was late night for me), made this one dupe of other and reopened other one. In case anyone thinks it's not correct, feel free to reopen this one, cheers. — RavinderSingh13, Jul 30 '22 at 00:40
@Erwin I see that, and it is certainly not right, but I don't see that either can be marked as a dupe of the other given how new they are (and have no clearly established or accepted answers) -- specially seeing that this one received more attention (three answers). So I voted to reopen this. Perhaps transfer your answer from there to this one? I am flagging this for the attention of moderators, since it's the same question by different users — zdim, Jul 30 '22 at 03:07
@RavinderSingh13 See the above comment, addressed to Erwin. I think the best we can do in this case is to flag it for moderators. — zdim, Jul 30 '22 at 03:08

anubhava · Accepted Answer · 2022-07-31T13:26:27.307

2

This perl command line should work:

perl -pe 's/(title=)"?[^"\s]*"?(.*) name="?([^"\s]+)"?/$1"$3"$2/' file

title="beta" lorem ipsum lorem ipsum
title="delta" Morbi posuere metus purus

Explanation:

(title=): Match title= and capture in group #1
"?[^"\s]+"?: Match a quoted non-space string
(.*): Match 0 or more of any chars and capture in group #2
name="?: Match name= text followed by optional "
([^"\s]+): Match a quoted non-space string and capture in group #3
"?: Optional "
$1"$3"$2: Replacement part

RegEx Demo

edited Jul 31 '22 at 13:26

answered Jul 29 '22 at 16:58

anubhava

761,203
64
569
643

For learning's sake, what if I have `name=beta` instead of `name="beta"`(without double quotes)? I seem to have problem including the second quotation mark: `-type f -exec perl -pi -w -e 's/(title=")(.*)(")(.*) name=(.*)/$1$5$3$4/' \{\} \;` – Jul 31 '22 at 13:05
Right. But what if I want to keep `"` in `title`? That is, from `title="alpha" lorem ipsum lorem ipsum name=beta` to `title="beta" lorem ipsum lorem ipsum`? – Jul 31 '22 at 13:20
1

ok check mu updated answer and demo – anubhava Jul 31 '22 at 13:24
1

Nice answer - often times you want `[^something]` instead of `.*?` - which to be honest doesn't always work. – AnArrayOfFunctions Aug 16 '22 at 01:38

zdim · Answer 2 · 2022-08-20T07:36:59.350

A bit of syntax: capture with (?<name>pattern) and then use that capture with $+{name} outside of the pattern (delimiters may be varied); see it in perlre. The whole regex

s{ title="(?<t>[^"]+)" (?<text>.*?) name="(?<n>[^"]+)" }
 {title="$+{n}"$+{text}}x

The \g{name} syntax attempted in the question is used inside the pattern itself (if it is needed further in the same pattern in which it first gets captured); but after the matching side, so in the replacement side or after the regex, the matches are retrieved from the %+ variable.

The [^"] is a negated character-class, matching any character other than ". The modifier /x at the end makes it ignore literal spaces inside so we can use them for readability.

A full example, with the above regex, to run on the command line

echo title=\"alpha\" lorem ipsum lorem ipsum name=\"beta\"  | perl -wpe
's{title="(?<t>[^"]+)"(?<text>.*?)name="(?<n>[^"]+)"}{title="$+{n}"$+{text}}'

(broken into two lines for readability). It prints

title="beta" lorem ipsum lorem ipsum

Not sure what the first pattern is captured for in the question but perhaps there is more to it than shown so it is captured here as well, into $+{t}.

Also, the question uses those quotes in a particular way. One can string together '-delimited strings for one command-line program (perl -wE'say''"hi"' is valid). The example in the question works since what would be "barewords" (one etc) happen to be inside regex, where they are OK, as patterns. But I'd suggest not to mess with that (if that was the intent).

RavinderSingh13 · Answer 3 · 2022-07-29T17:14:40.653

1st solution: Since you are using find command of shell, so in case you are ok with awk code, here it goes, written and tested in GNU awk.

Here is the Online demo for used regex in following code.

awk -v s1="\"" '
match($0,/(title=)"[^"]*" (.*)name="([^"]*)"/,arr){
  print arr[1] s1 arr[3] s1,arr[2]
}
'  Input_file

Explanation: Simple explanation here would be using GNU awk's match function; which allows us to use regex in it to find the required output. In here I am using regex(title=)"[^"]*" (.*)name="([^"]*)" which is creating 3 capturing groups, whose values are getting stored into array named arr with index of ``1,2,3 with values of captured groups values. Then while printing the values I am printing them as per required output by OP.

2nd solution: In sed with same regex and -E(ERE) enabled option please try following code.

sed -E 's/^(title=)"[^"]*" (.*)name="([^"]*)"/\1"\3" \2/' Input_file

Perl regex capture groups and reshuffle pattern

3 Answers3

Linked