Performing regex capture and then substitute using SED/PERL

Question

I have a data that looks like this (let's call this file submit.txt):

dir1/pmid_5409464.txt
dir1/pmid_5788247.txt
dir1/pmid_4971884.txt

What I want to do is to perform an inline file regex change so that it results in the following

perl mycode.pl /home/neversaint/dir1/pmid_5409464.txt > /home/neversaint/dir1/pmid_5409464.output
perl mycode.pl/home/neversaint/dir1/pmid_5788247.txt > /home/neversaint/dir1/pmid_5788247.output
perl mycode.pl /home/neversaint/dir1/pmid_4971884.txt > /home/neversaint/dir1/pmid_4971884.output

Is there a SED/Perl one liner to do that?

My difficulty is in capturing the input file name and then create the output file (.output) - for each line - based on that. I'm stuck with this:

sed 's/^/perl mycode.pl \/home\/neversaint\/dir1\//g' submit.txt |
sed 's/$/ >/'

`awk '{print "xxx/x/y/"$0 "> xxxxxxxx/$0}' list > output`? good luck. — shellter, Jul 31 '13 at 02:30
No that won't do. The point is for every line capture the `pmid_xxx` from `pmid_xxx.txt` and print the output version of that `pmid_xxx.output` also for each line. — neversaint, Jul 31 '13 at 02:36
sed http://stackoverflow.com/questions/2777579/how-to-output-only-captured-groups-with-sed — Ciro Santilli OurBigBook.com, Nov 19 '15 at 08:15

score 15 · Accepted Answer · edited May 23 '17 at 12:24

15

You can use escaped parenthesis to capture groups, and access the groups with \1, \2, etc.

sed 's/^\(.*\).txt$/perl mycode.pl \/home\/neversaint\/\1\.txt > \/home\/neversaint\/\1.output/' submit.sh

output:

perl mycode.pl /home/neversaint/dir1/pmid_5409464.txt > /home/neversaint/dir1/pmid_5409464.output
perl mycode.pl /home/neversaint/dir1/pmid_5788247.txt > /home/neversaint/dir1/pmid_5788247.output
perl mycode.pl /home/neversaint/dir1/pmid_4971884.txt > /home/neversaint/dir1/pmid_4971884.output

edit: it doesn't look like sed has a built-in in place file editing (GNU sed has the -i option). It still possible to do but this solution just prints to standard out. You could also use a Perl one liner as shown here: sed edit file in place

edited May 23 '17 at 12:24

Community

1
1

answered Jul 31 '13 at 02:39

hmatt1

4,939
3
30
51

thanks so much. BTW is there a way I can split your code into multiple lines; It's easier to read that way in my editor, later I realized. – neversaint Jul 31 '13 at 03:07
1

You're welcome! You could use shell variables to split it up, similar to this: http://stackoverflow.com/questions/8078872/can-a-long-sed-command-be-broken-over-several-lines. Basically store the search string in one variable, and replace string in another. I don't know if this would help much since the replace string would still be pretty long. You could also put the search and replace part in a file, and call it using the sed -f option – hmatt1 Jul 31 '13 at 03:17

score 1 · Answer 2 · answered Jul 31 '13 at 02:41

1

You asked for a Sed one-liner, you got it.

sed 's/$[^.]*$\.txt/perl mycode.pl \/home\/neversaint\/\1.txt > \/home\/neversaint\/\1.output/' submit.txt > output.txt

answered Jul 31 '13 at 02:41

AlienHoboken

2,750
20
23

Use another separator, instead of /, when you have many slashes in teh string (e.g. file names). Sed also works with _, | or :. – Attila O. May 23 '17 at 08:53

Ramg · Answer 3 · 2013-07-31T06:37:03.873

The perl oneliner for doing the same is

perl -pe "s@(.*?)(\.txt)@perl mycode.pl /home/neversaint/\\1\\2 > /home/neversaint/\\1.output@g" submit.txt

The above command will produce a replaced string in the console and you have to redirect the output to another file.

For replacing within the file (inline replace) you can add -i option . For eg

perl -pe "s@(.*?)(.txt)@perl mycode.pl /home/neversaint/\1\2 > /home/neversaint/\1.output@g" -i submit.txt

The above will perform a replace within the submit.txt file itself.

Performing regex capture and then substitute using SED/PERL

3 Answers3