1

I have the following csv where I have to replace the thousand comma separator with nothing. In example below, when I have the amount "1,000.00" I should have 1000.00 (no comma, no quotes) instead.

I use JREPL to remove header from my csv

jrepl "(?:.*\n){1,1}([\s\S]*)" "$1" /m /f "csv/Transactions.csv" /o "csv/Transactionsfeed.csv")

I was wondering if I could do the process of removing header + dealing with the thousand comma in one step. I am also opened to the option of doing it with another command in a second step...

Tnx ID,Trace ID - Gateway,Profile,Customer PIN,Customer,Ext. ID,Identifier,Amount,Chrg,Curr,Processor,Type,Status,Created By,Date Created,RejectReason
1102845,3962708,SL,John,Mohammad Alo,NA,455015*****9998,900.00,900.00,$,Un,Credit Card,Rejected,Internet,2016-05-16 06:54:10,"-330: Fail by bank, try again later(refer to acquirer)"
1102844,3962707,SL,John,Mohammad Alo,NA,455015*****9998,"1,000.00","1,000.00",$,Un,Credit Card,Rejected,Internet,2016-05-16 06:52:26,"-330: Fail by bank, try again later(refer to acquirer)"
dbenham
  • 127,446
  • 28
  • 251
  • 390
R_life_R
  • 786
  • 6
  • 26

2 Answers2

1

Yes, there is a very efficient and fairly compact and straight-forward solution:

jrepl "\q(\d{1,3}(?:,\d{3})*(?:\.\d*)*)\q" "$1.replace(/,/g,'')" /x /j /jendln "if (ln==1) $txt=false" /f "csv/Transactions.csv" /o "csv/Transactionsfeed.csv"

The /JENDLN JScript expression strips the header line by setting $txt to false if it is the first line.

The search string matches any quoted number that contains commas as thousand separators, and $1 is the number without the quotes.

The replace string is a JScript expression that replaces all commas in the matching $1 number with nothing.

EDIT

Note that the above will likely work with any CSV that you are likely to have. However, it would fail if you have a quoted field that contains a quoted number string literal. Something like the following would yield a corrupted CSV with the code above:

...,"some text ""123,456.78"" more text",...

This issue can be fixed with a bit more regex code. You only want to modify a quoted number if the opening quote is preceded by a comma or the beginning of the line, and the closing quote should be followed by a comma or the end of line.

A look-ahead assertion can be used for the trailing comma/EOL. But JREPL does not support look-behind. So the leading comma/BOL must be captured and preserved in the replacement

jrepl "(^|,)\q(\d{1,3}(?:,\d{3})*(?:\.\d*)*)\q(?=$|,)" "$1+$2.replace(/,/g,'')" /x /j /jendln "if (ln==1) $txt=false" /f "csv/Transactions.csv" /o "csv/Transactionsfeed.csv"

EDIT in response to changing requirement in comment

The following will simply remove all quotes and commas from quoted CSV fields. I don't like this concept, and I suspect there is a much better way to handle this for import into mysql, but this is what the OP is asking for.

jrepl "(^|,)(\q(?:[^\q]|\q\q)*\q)(?=$|,)" "$1+$2.replace(/,|\x22/g,'')" /x /j /jendln "if (ln==1) $txt=false" /f "csv/Transactions.csv" /o "csv/Transactionsfeed.csv"
dbenham
  • 127,446
  • 28
  • 251
  • 390
  • your solution works perfectly for the comma in thousands & the first row removal. I only noticed an error in case of a comma separated string between double quotes, as in this example: the field "Special Bonus, OK allan" makes it fail ("," in field should be erased too) Tnx ID,Trace ID - Gateway,Profile,Customer PIN,Customer,Ext. ID,Identifier,Amount,Chrg,Curr,Processor,Type,Status,Created By,Date Created,RejectReason 1102921,,SL,jojo,Mohamed,"Special Bonus, OK allan",,200.00,200.00,$,ManualBonus,Generic,Approved,maurice,2016-05-16 15:26:26 – R_life_R May 16 '16 at 21:36
  • @RenaudDUGERT - Huh! Why should all commas within quotes be removed? That certainly was not in your stated requirements. And blindly removing all commas from quoted fields could alter the meaning. I understand why you want to remove commas from numbers, but not all quoted commas. But that is certainly an easier problem to solve. – dbenham May 16 '16 at 21:47
  • you are right... my mistake! but importing csv to mysql makes things harder with commas in string. In this specific case, getting rid of any comma within a string (between double quotes) would be necessary... – R_life_R May 16 '16 at 21:49
  • Should all quotes be removed as well? – dbenham May 16 '16 at 21:52
  • Yes sir! the quotes should also be removed. Thanks for your great help btw. Would be lost without this. – R_life_R May 16 '16 at 21:54
  • thanks dbenham. I really appreciate your help! greetings from Paris, France – R_life_R May 16 '16 at 22:12
1

May I suggest you a different, simpler solution? The 5-lines Batch file below do what you want; save it with .bat extension:

@set @a=0  /*
@cscript //nologo //E:JScript "%~F0" < "csv/Transactions.csv" > "csv/Transactionsfeed.csv"
@goto :EOF */

WScript.Stdin.ReadLine();
WScript.Stdout.Write(WScript.Stdin.ReadAll().replace(/(\"(\d{1,3}),(\d{3}\.\d{2})\")/g,"$2$3"));

JREPL.BAT is a large and complex program capable of advanced replacement tasks; however, your request is very simple. This code is also a Batch-JScript hybrid script that use the replace method in the same way as JREPL.BAT, but that is tailored to your specific request.

  • The first ReadLine() read the header line of the input file, so the posterior ReadAll() read and process the rest of lines.
  • The regexp (\"(\d{1,3}),(\d{3}\.\d{2})\") define 3 submatches enclosed in parentheses: the first one is the whole number enclosed in quotes, like "1,000.00"; the second submatch is the digits before the comma and the third submatch is the digits after the comma, including the decimal point.
  • The .replace method change the previous regexp, that is, the whole number enclosed in quotes by just the second and third submatches.
Aacini
  • 65,180
  • 12
  • 72
  • 108
  • great solution. The only issue I get (my mistake!) what I mentionned to dbenham in that routine has also to get rid of commas in double quote string, such as "Special Bonus, OK allan" to "Special Bonus OK allan". any idea? – R_life_R May 16 '16 at 21:47
  • What happens if the number is `"1,234,567.89"`. This simplistic solution has an inherent limit to the number magnitude. Although the solution is simpler if all quoted commas are removed. – dbenham May 16 '16 at 21:50
  • thanks for your comment. In this practical example, amounts should not be greater than 999,999.99, but it makes sense... – R_life_R May 16 '16 at 21:52