1

How can i remove the "^M" at the and of each line when its doesnt match ? the command is:

perl -pe 's{(">)[^</zone>]}{$1</zone>}g' $travail_dir/zones.txt >$travail_dir/ys.txt

i get this in my ys.txt :

<zone^M
numero_page="005"></zone>^M
<zone^M
id_zone="2"^M
numero_page="005"></zone>

How can i modify my command to get the expected result without "^M" ?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
YSA
  • 35
  • 6
  • 1
    `[^]` is just wrong, did you mean `(">)(?!)`? Try `perl -pe 's/\r//g;s{(">)(?!)}{$1}g'` – Wiktor Stribiżew Oct 14 '20 at 12:06
  • Thanks @WiktorStribiżew yes you're right, you saved me, thx – YSA Oct 14 '20 at 12:23
  • 1
    [Why is it such a bad idea to parse XML with regex?](https://stackoverflow.com/a/1732454/1030675) – choroba Oct 14 '20 at 12:29
  • @choroba It is true that parsing valid XML with regex is a bad idea, but here, the XML is invalid and OP tries to make it valid to make it parsable. – Wiktor Stribiżew Oct 14 '20 at 12:31
  • 1
    Perhaps you got the file from MS Windows and you utilize it in Linux. Windows and Unix have different end of line. You can make required changes with editor [vim](https://www.vim.org/) which in most cases installed. Open file with editor `vim filename`, in editor issue a command `:set ff=unix` and save file `:wq`. Or if in your system available utility `dos2unix` you can use it. – Polar Bear Oct 14 '20 at 18:16

1 Answers1

1

There are two things here:

  • [^</zone>] matches any single char other than <, /, z, o, n, e and >, it does not mean *any text other than </zone>. You need to use a negative lookahead here, (?!</zone>) that will fail the "> match if it is directly followed with </zone>
  • ^M are carriage returns, CR symbols, and you may either remove them with dos2unix before passing the file to perl, or just remove them with a separate substitution command, s/\r//g.

You can use

perl -pe 's/\r//g;s{(">)(?!</zone>)}{$1</zone>}g' $travail_dir/zones.txt > $travail_dir/ys.txt
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563