2

I need to substitute \x0d\x0a to \x2c\x0d\x0a in a file

The following does not do anything:

awk '{if NR> 1 {gsub(/\x0D\x0A/,"\x2C\x0D\x0A"); print}}' test.csv > testfixed.csv
$ xxd test.csv
00000e0: 350d 0a45 4941 2d39 3330 2c44 6169 6c79  5..EIA-930,Daily
00000f0: 2c4e 5949 532c 2c55 5443 302c 3030 3132  ,NYIS,,UTC0,0012
kvantour
  • 25,269
  • 4
  • 47
  • 72
daimne
  • 47
  • 1
  • 6

2 Answers2

3

You are trying to make a substitution of the hex string \x0D\x0A which is nothing more than CRLF or \r\n.

Since by default splits its records on the <newline> character (which is LF), you actually never have to try to match your <newline> character \n (or \x0a). So all you need to do is substitute \r into ,\r (0x2c is the hex value of ,). So this should do the trick:

awk '(NR>1){sub("\r$",",\r"); print}' file

So why was your script failing?

As mentioned before, works in records and the default record separator is the <newline> character. This means that the <newline> character, also written as \n and having hexadecimal value \x0a, is never part of the record $0. Also, the print statement automatically adds its record output separator ORS after the record. By default this is again the <newline> character. So you did not have to try to substitute that. All you had to do was:

awk 'NR > 1 {sub(/\x0D$/,"\x2C\x0D"); print}' test.csv > testfixed.csv

So is it possible to substitute by means of its hexacedimal values?

Yes, clearly it is!

echo -n "Hello World" | awk 'sub(/\x57\x6f\x72\x6c\x64/,"\x43\x6f\x77")'

But how can I change <newline>?

You can just redefine the output record separator ORS:

awk -v ORS="whatever" '1'

Also, using GNU awk, you can follow glenn jackman's solution.


Very much related: Why does my tool output overwrite itself and how do I fix it?

kvantour
  • 25,269
  • 4
  • 47
  • 72
1

The newline \n or \x0A will not appear in each record because by default it is the record separator.

I would do this: define the input and output record separators to be \r\n and then for line number > 1, append a comma to the record:

$ printf "a\r\nb\r\nc\r\n" >| file

$ hexdump -C file
00000000  61 0d 0a 62 0d 0a 63 0d  0a                       |a..b..c..|
00000009

$ awk 'BEGIN {RS = ORS = "\r\n"} NR > 1 {$0 = $0 ","} 1' file | hexdump -C
00000000  61 0d 0a 62 2c 0d 0a 63  2c 0d 0a                 |a..b,..c,..|
0000000b
glenn jackman
  • 238,783
  • 38
  • 220
  • 352