0

i m trying to perform the following substitution on lines of the general format:

BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC, .......

as you see the problem is that its a comma separated file, with a specific field containing a comma decimal. I would like to replace that with a dot .

I ve tried this, to replace the first occurence of a pattern after match, but to no avail, could someone help me?

sed -e '/,"/!b' -e "s/,/./"
sed -e '/"/!b' -e ':a' -e "s/,/\./"

Thanks in advance. An awk or perl solution would help me as well. Here's an awk effort:

gawk -F "," 'substr($10, 0, 3)==3 && length($10)==12 { gsub(/,/,".", $10); print}'

That yielded the same file unchanged.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
onlyf
  • 767
  • 3
  • 19
  • 39

6 Answers6

3

CSV files should be parsed in awk with a proper FPAT variable that defines what constitutes a valid field in such a file. Once you do that, you can just iterate over the fields to do the substitution you need

gawk 'BEGIN { FPAT = "([^,]+)|(\"[^\"]+\")"; OFS="," } 
       { for(i=1; i<=NF;i++) if ($i ~ /[,]/) gsub(/[,]/,".",$i);}1' file

See this answer of mine to understand how to define and parse CSV file content with FPAT variable. Also see Save modifications in place with awk to do in-place file modifications like sed -i''.

Inian
  • 80,270
  • 14
  • 142
  • 161
  • You don't need `if ($i ~ /[,]/)` - the `gsub(/[,]/...)` just won't do anything if there's no match for `[,]`. Also that will delete empty fields - you should use `*`s instead of `+`s in the `FPAT` setting. – Ed Morton May 20 '19 at 13:23
1

The following sed will convert all decimal separators in quoted numeric fields:

 sed 's/"\([-+]\?[0-9]*\)[,]\?\([0-9]\+\([eE][-+]\?[0-9]+\)\?\)"/"\1.\2"/g'

See: https://www.regular-expressions.info/floatingpoint.html

kvantour
  • 25,269
  • 4
  • 47
  • 72
1

This might work for you (GNU sed):

sed -E ':a;s/^([^"]*("[^",]*"[^"]*)*"[^",]*),/\1./;ta' file

This regexp matches a , within a pair of "'s and replaces it by a .. The regexp is anchored to the start of the line and thus needs to be repeated until no further matches can be matched, hence the :a and the ta commands which causes the substitution to be iterated over whilst any substitution is successful.

N.B. The solution expects that all double quotes are matched and that no double quotes are quoted i.e. \" does not appear in a line.

potong
  • 55,640
  • 6
  • 51
  • 83
1

If your input always follows that format of only one quoted field containing 1 comma then all you need is:

$ sed 's/\([^"]*"[^"]*\),/\1./' file
BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109.07",DF,CCCCCCCCCCC, .......

If it's more complicated than that then see What's the most robust way to efficiently parse CSV using awk?.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
0

Assuming you have this: BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC

Try this: awk -F',' '{print $1,$2,$3,$4"."$5,$6,$7}' filename | awk '$1=$1' FS=" " OFS=","

Output will be: BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109.07",DF,CCCCCCCCCCC

You simply need to know the field numbers for replacing the field separator between them.

Skeptic Biped
  • 92
  • 1
  • 8
0

In order to use regexp as in perl you have to activate extended regular expression with -r. So if you want to replace all numbers and omit the " sign, then you can use this:

echo 'BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC, .......'|sed -r 's/\"([0-9]+)\,([0-9]+)\"/\1\.\2/g'

If you want to replace first occurrence only you can use that:

echo 'BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC, .......'|sed -r 's/\"([0-9]+)\,([0-9]+)\"/\1\.\2/1'

https://www.gnu.org/software/sed/manual/sed.txt

Borislav Markov
  • 1,495
  • 11
  • 12