Need to substitute \x0d\x0a to \x2c\x0d\x0a in a file

Question

I need to substitute \x0d\x0a to \x2c\x0d\x0a in a file

The following does not do anything:

awk '{if NR> 1 {gsub(/\x0D\x0A/,"\x2C\x0D\x0A"); print}}' test.csv > testfixed.csv

$ xxd test.csv
00000e0: 350d 0a45 4941 2d39 3330 2c44 6169 6c79  5..EIA-930,Daily
00000f0: 2c4e 5949 532c 2c55 5443 302c 3030 3132  ,NYIS,,UTC0,0012

do simple text substitution. Also your file has lowercase 0d 0a. — karakfa, Oct 23 '18 at 16:12
Have a look at the [awk tag info page](https://stackoverflow.com/tags/awk/info) for help with your syntax. — glenn jackman, Oct 23 '18 at 16:58

kvantour · Answer 1 · 2018-10-23T17:18:37.727

You are trying to make a substitution of the hex string \x0D\x0A which is nothing more than CRLF or \r\n.

Since awk by default splits its records on the <newline> character (which is LF), you actually never have to try to match your <newline> character \n (or \x0a). So all you need to do is substitute \r into ,\r (0x2c is the hex value of ,). So this should do the trick:

awk '(NR>1){sub("\r$",",\r"); print}' file

So why was your script failing?

As mentioned before, awk works in records and the default record separator is the <newline> character. This means that the <newline> character, also written as \n and having hexadecimal value \x0a, is never part of the record $0. Also, the print statement automatically adds its record output separator ORS after the record. By default this is again the <newline> character. So you did not have to try to substitute that. All you had to do was:

awk 'NR > 1 {sub(/\x0D$/,"\x2C\x0D"); print}' test.csv > testfixed.csv

So is it possible to substitute by means of its hexacedimal values?

Yes, clearly it is!

echo -n "Hello World" | awk 'sub(/\x57\x6f\x72\x6c\x64/,"\x43\x6f\x77")'

But how can I change <newline>?

You can just redefine the output record separator ORS:

awk -v ORS="whatever" '1'

Also, using GNU awk, you can follow glenn jackman's solution.

Very much related: Why does my tool output overwrite itself and how do I fix it?

glenn jackman · Answer 2 · 2018-10-23T16:54:02.887

1

The newline \n or \x0A will not appear in each record because by default it is the record separator.

I would do this: define the input and output record separators to be \r\n and then for line number > 1, append a comma to the record:

$ printf "a\r\nb\r\nc\r\n" >| file

$ hexdump -C file
00000000  61 0d 0a 62 0d 0a 63 0d  0a                       |a..b..c..|
00000009

$ awk 'BEGIN {RS = ORS = "\r\n"} NR > 1 {$0 = $0 ","} 1' file | hexdump -C
00000000  61 0d 0a 62 2c 0d 0a 63  2c 0d 0a                 |a..b,..c,..|
0000000b

edited Oct 23 '18 at 16:54

answered Oct 23 '18 at 16:48

glenn jackman

238,783
38
220
352

Ah yes, requires GNU awk for multi-character RS variable – glenn jackman Oct 23 '18 at 17:32

Need to substitute \x0d\x0a to \x2c\x0d\x0a in a file

2 Answers2