Remove the "^M" at the and of each line when its doesnt match Perl

Question

How can i remove the "^M" at the and of each line when its doesnt match ? the command is:

perl -pe 's{(">)[^</zone>]}{$1</zone>}g' $travail_dir/zones.txt >$travail_dir/ys.txt

i get this in my ys.txt :

<zone^M
numero_page="005"></zone>^M
<zone^M
id_zone="2"^M
numero_page="005"></zone>

How can i modify my command to get the expected result without "^M" ?

`[^]` is just wrong, did you mean `(">)(?!)`? Try `perl -pe 's/\r//g;s{(">)(?!)}{$1}g'` — Wiktor Stribiżew, Oct 14 '20 at 12:06
Thanks @WiktorStribiżew yes you're right, you saved me, thx — YSA, Oct 14 '20 at 12:23
[Why is it such a bad idea to parse XML with regex?](https://stackoverflow.com/a/1732454/1030675) — choroba, Oct 14 '20 at 12:29
@choroba It is true that parsing valid XML with regex is a bad idea, but here, the XML is invalid and OP tries to make it valid to make it parsable. — Wiktor Stribiżew, Oct 14 '20 at 12:31
Perhaps you got the file from MS Windows and you utilize it in Linux. Windows and Unix have different end of line. You can make required changes with editor [vim](https://www.vim.org/) which in most cases installed. Open file with editor `vim filename`, in editor issue a command `:set ff=unix` and save file `:wq`. Or if in your system available utility `dos2unix` you can use it. — Polar Bear, Oct 14 '20 at 18:16

score 1 · Accepted Answer · answered Oct 14 '20 at 12:27

There are two things here:

[^</zone>] matches any single char other than <, /, z, o, n, e and >, it does not mean *any text other than </zone>. You need to use a negative lookahead here, (?!</zone>) that will fail the "> match if it is directly followed with </zone>
^M are carriage returns, CR symbols, and you may either remove them with dos2unix before passing the file to perl, or just remove them with a separate substitution command, s/\r//g.

You can use

perl -pe 's/\r//g;s{(">)(?!</zone>)}{$1</zone>}g' $travail_dir/zones.txt > $travail_dir/ys.txt

1 Answers1