7

(I put a exact text and command I executed so would be looking a bit messy.)

I have a .TXT file looking like

11111111111111111111111111111111111111111111111111111111111111111111111
11111111111111111111111111111111111111111111111111111111111111111111111

And outcome I am looking for would be like

11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111

Command I have tried is

sed -i 's/\(.\{14\}\)\(.\{7\}\)\(.\{2\}\)\(.\{1\}\)\(.\{3\}\)\(.\{13\}\)\(.\{1\}\)\(.\{8\}\)\(.\{16\}\)\(.\{3\}\)/\1,\2,\3,\4,\5,\6,\7,\8,\9,\10,/' SOME.TXT

And outcome I have got was

11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,1111111111111110,111
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,1111111111111110,111

I have literally no idea why these 0s suddenly popped out and ' , ' doesn't appear in the position where I command even though it worked half way.

Is this a bug or something in sed command?

Inian
  • 80,270
  • 14
  • 142
  • 161
gggert
  • 166
  • 7

3 Answers3

12

It is printing 0 in output because sed capture groups and their back-references can be up to 9 only and \10 is interpreted as \1 followed by literal 0.

You can solve it easily using FIELDWIDTHS feature of gnu-awk:

awk -v OFS=, 'BEGIN { FIELDWIDTHS = "14 7 2 1 3 13 1 8 16 3 *" } {$1 = $1} 1' file
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111

Just for academic exercise, here is a working sed to solve this using 2 substitutions:

sed -E 's/(.{14})(.{7})(.{2})(.)(.{3})(.{13})(.)(.{8})(.+)/\1,\2,\3,\4,\5,\6,\7,\8,\9/; s/(.+,.{16})(.{3})(.*)/\1,\2,\3/' file
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 1
    Thank you so much. I am not really familiar with awk command yet so I am gonna try that later also but the second suggestion worked perfectly for me! – gggert Aug 11 '20 at 10:44
  • 2
    @gggert use the awk solution. Whether you're more familiar with sed or not surely you can see how much simpler the awk solution is. – Ed Morton Aug 11 '20 at 14:23
  • 1
    nice answer and useful GNU awk feature. – thanasisp Sep 20 '20 at 05:44
6

sed can't reference capture groups > 9, Perl can:

perl -i -pe  's/(.{14})(.{7})(.{2})(.)(.{3})(.{13})(.)(.{8})(.{16})(.{3})/$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,/' SOME.TXT
choroba
  • 231,213
  • 25
  • 204
  • 289
1

If you insist to use sed, you can do something like:

sed 's/./&,/68;s/./&,/65;s/./&,/49;s/./&,/41;s/./&,/40;s/./&,/27;s/./&,/24;s/./&,/23;s/./&,/21;s/./&,/14' test.txt
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111
Maroun
  • 94,125
  • 30
  • 188
  • 241