2

I need to remove an extra pipe character at the end of header row of a pipe delimited csv file with sed. The literal string that I am trying to find is COLNAME|

Working on a GCP Windows server. The command I am trying to use:

"C:\Program Files (x86)\GnuWin32\bin\sed.exe" sed '0,/COLNAME"|"/s//COLNAME/' FILENAME

returns the output... sed.exe: -e expression #1, char 3: unterminated `s' command

I'm new to sed and have been playing around with the s command for a while but cannot seem to get the syntax correct.

Any suggestion on how to accomplish this?

Ryszard Czech
  • 18,032
  • 4
  • 24
  • 37
  • So, you want to find and replace `COLNAME"|"` with `COLNAME` in some file, right? – Wiktor Stribiżew Jan 12 '21 at 19:35
  • well the literal string that I want to replace is COLNAME| the " double quotes are around the pipe in the command I'm trying to run as I read that special characters need to be surrounded with double quotes – BrianP_TRHC Jan 13 '21 at 18:42

1 Answers1

1

If you want to replace COLNAME| with COLNAME"|" in Windows, using the GNU sed, you can use

"C:\Program Files (x86)\GnuWin32\bin\sed.exe" "s/COLNAME|/COLNAME"^""|"^""/g"

Here, COLNAME| matches COLNAME| and COLNAME"^""|"^"" forms the literal COLNAME"|" replacement since COLNAME" ends the quoted string, ^" appends a literal " char to the sed command, "|" appends a | char to the sed command and then ^" appends another literal " to the sed command, and the next " starts the finishing part. The g flag makes it match and replace all occurrences.

If you want to replace COLNAME"|" with COLNAME in Windows, using the GNU sed, you can do that with

"C:\Program Files (x86)\GnuWin32\bin\sed.exe" "s/COLNAME"^""|"^""/COLNAME/g" FILENAME
"C:\Program Files (x86)\GnuWin32\bin\sed.exe" "s/COLNAME\x22|\x22/COLNAME/g" FILENAME
"C:\Program Files (x86)\GnuWin32\bin\sed.exe" "s/COLNAME\d34|\d34/COLNAME/g" FILENAME

Mind that you need to enclose the substitution command with double quotes and to match a double quote, you can't simply use a " or a \", you can match it with an escaped ^", or with \x22, a hex reprentation of the char, or \d34.

Note that in "s/COLNAME"^""|"^""/COLNAME/g", the sed command is built in the following way:

  1. "s/COLNAME" sets the beginning
  2. ^" appends a literal " char to the sed command
  3. "|" - adds | pipe char
  4. ^" - adds another "
  5. "/COLNAME/g" - finishes off the sed command with the replacement and the global modifier/flag.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • the literal string that I want to replace is COLNAME| the double quotes are around the pipe in the command I'm trying to run as I read that special characters need to be surrounded with double quotes – BrianP_TRHC Jan 13 '21 at 18:54
  • @BrianP_TRHC So, you want to replace `COLNAME|` with `COLNAME"|"`? Then use `"C:\Program Files (x86)\GnuWin32\bin\sed.exe" "s/COLNAME|/COLNAME"^""|"^""/g" FILENAME` – Wiktor Stribiżew Jan 13 '21 at 19:19
  • @BrianP_TRHC See the updated answer, does it work now? – Wiktor Stribiżew Jan 14 '21 at 00:07
  • No, the literal string that I want to replace is COLNAME| with COLNAME – BrianP_TRHC Jan 14 '21 at 18:27
  • @BrianP_TRHC So, there is no issue at all then, `"C:\Program Files (x86)\GnuWin32\bin\sed.exe" "s/COLNAME|/COLNAME/g" FILENAME` works fine. – Wiktor Stribiżew Jan 14 '21 at 18:31
  • the only issue with that syntax is the time it takes to scan the entire file; this will be used on files >500GB so I would prefer to only find/replace the first occurrence of COLNAME| with COLNAME. Or, if it's easier, just go the end of the first line and delete the pipe, but I am not familiar enough with scripting to accomplish that. – BrianP_TRHC Jan 14 '21 at 20:22
  • Deleting the pipe on the first line at its end is just `"C:\Program Files (x86)\GnuWin32\bin\sed.exe" "1!b;s/|$//" FILENAME` – Wiktor Stribiżew Jan 14 '21 at 22:00
  • "C:\Program Files (x86)\GnuWin32\bin\sed.exe" "1!b;s/|$//" FILENAME did not seem to work. COLNAME| still remained at the end of the header row upon completion. Also, it still scanned the entire file; this will be used in automation to process files >500 GB daily so scanning the full file will not be sufficient for the task I'm trying to accomplish. – BrianP_TRHC Jan 15 '21 at 15:55
  • @BrianP_TRHC That is probably due to CRLF endings, you might just try `sed "s/\r//g"` before. Or, `"C:\Program Files (x86)\GnuWin32\bin\sed.exe" "1!b;s/|[[:space:]]*$//" FILENAME` or `"C:\Program Files (x86)\GnuWin32\bin\sed.exe" "1!b;s/|\r*$//" FILENAME` – Wiktor Stribiżew Jan 15 '21 at 16:02