1

I have a comma-delimited CSV file (,) where commas are escaped by surrounding the data in quotation marks (").

ID,Email,Job Title
1001,wdaelman@example.com,Technician
1002,rfewell@example.com,"Specialist, HRIT"
1003,jcoulbeck@example.com,"Director, Nursing"

I want to convert my CSV to a pipe-delimited file (|) by using Notepad++ to find and replace any commas that aren't enclosed in double quotes (") with a pipe.

ID|Email|Job Title
1001|wdaelman@example.com|Technician
1002|rfewell@example.com|"Specialist, HRIT"
1003|jcoulbeck@example.com|"Director, Nursing"

My first approach was to use a regular expression to match any unquoted commas. However, searching for ("[^"]*")|, in Notepad++ replaced both unquoted commas and any quoted strings that contained a comma.

1002|rfewell@example.com||

How can I convert a comma-delimited CSV file (,) to a pipe-delimited file (|) with Notepad++?

Stevoisiak
  • 23,794
  • 27
  • 122
  • 225

2 Answers2

2

You can use your regex, ("[^"]*")|,, but you need to replace with (?1$1:|).

The pattern matches and captures into Group 1 a ", then any 0+ chars other than " and then again a " (with ("[^"]*")), or (|) just matches a comma (that is, a comma outside of double quoted substrings as those have already been matched with the preceding branch).

The (?1$1:|) replacement pattern means that once Group 1 matched ((?1) the Group 1 value should be put back where it was (see the $1 placeholder), else (:) replace the matched string (i.e. the comma) with a pipe symbol.

enter image description here

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Is it possible to escape data with pipe characters? (IE: `1004,Special|HRIT` → `1004|"Special|HRIT"`) – Stevoisiak Sep 17 '18 at 18:17
  • @StevenVascellaro Do you mean you want to wrap comma sepatated fields that contain `|` with double quotation marks? Try `([^,|\v]*\|[^,\v]*)|,` and replace with `(?1"$1":|)`. – Wiktor Stribiżew Sep 17 '18 at 18:47
2

You may want to try with this:

  1. Wrap data that has | character. First, replace: ,([^"\n,]*\|[^"\n,]*) by ,"\1"
  2. Use | as delimiter: Replace: ,("[^"\n]*"|[^,\n]*) by |\1

NOTE: I'm completelly ignoring first column, since It seems to be an ID that needs no extra processing

Julio
  • 5,208
  • 1
  • 13
  • 42