1

I've recently discovered Miller (mlr) for batch process csv files and love it, however it appears that numbers in the original data formatted as currency are recognized as strings by Miller, so trying to perform mathematical functions on them returns an error.

Here's a dummy example:

mlr --csv --opprint put '$total=$amt+$tax1+$tax2' data.csv

which returns:

name     amt    tax1  tax2  total
producta $50.00 $2.50 $4.50 (error)
productb $60.00 $3.00 $5.40 (error)
productc $70.00 $3.50 $6.30 (error)
productd $80.00 $4.00 $7.20 (error)
producte $90.00 $4.50 $8.10 (error)

The best work around I've found is using ssub on each individual column to remove the dollar sign, and then I can add up the data.

Example:

mlr --csv --opprint put '$amt=ssub($amt,"$","");$tax1=ssub($tax1,"$","");$tax2=ssub($tax2,"$","")' then put '$total=fmtnum($amt+$tax1+$tax2,"%3.2f")' data.csv

which returns:

name     amt   tax1 tax2 total
producta 50.00 2.50 4.50 57.00
productb 60.00 3.00 5.40 68.40
productc 70.00 3.50 6.30 79.80
productd 80.00 4.00 7.20 91.20
producte 90.00 4.50 8.10 102.60

While this workaround is functional, it isn't very efficient. In csv files with a lot of columns, this method will require lots of scrubbing and is prone to error. Is there a better way to get Miller to recognize currency formatted numbers as numbers rather than text?

Hurston
  • 48
  • 7

1 Answers1

2

it's not possible.

You can use a global search & replace

mlr --csv put -S '                                                                                              1 ↵
  for (k in $*) {
    $[k] = gsub($[k], "[$]", "");
  }
' then merge-fields -k -a sum -r '(amt|tax)' -o out input.csv

to have

+----------+-------+------+------+------------+
| name     | amt   | tax1 | tax2 | out_sum    |
+----------+-------+------+------+------------+
| producta | 50.00 | 2.50 | 4.50 | 57.000000  |
| productb | 60.00 | 3.00 | 5.40 | 68.400000  |
| productc | 70.00 | 3.50 | 6.30 | 79.800000  |
| productd | 80.00 | 4.00 | 7.20 | 91.200000  |
| producte | 90.00 | 4.50 | 8.10 | 102.600000 |
+----------+-------+------+------+------------+

And in this way to have a lot of columns is not a problem

aborruso
  • 4,938
  • 3
  • 23
  • 40