0
sed -c -i -r 's/[^a-zA-Z 0-9`~!@#$%^&*()_+\[\]\\{}|;'\'':",.\/<>?]//g' report.csv

I'm using sed for windows from http://gnuwin32.sourceforge.net/packages/sed.htm

Can't seem to get things escaped properly, I've got csv files with lots of junk characters like NUL, BEL, and several others that I need to strip from files but I don't know which ones may be present so trying to do the inverse and only permit ones I want. Needs to be called from a script on the windows command line. sed is a tool i'm reasonably familiar with and for the most part it works, but I can't seem to get this one working. With the above it says The system cannot find the path specified, if I change the quotes and escape characters i get other errors so don't get hung up on any particular one. The file referenced is in the same folder the command is executed on.

shadowfoxy
  • 31
  • 1
  • 5
  • you're running this as a windows .exe? or in something like cygwin? – Marc B Jul 06 '16 at 14:27
  • yes a windows sed.exe is in windows directory and this would be called from the windows shell, separate from a cygwin enviroment – shadowfoxy Jul 06 '16 at 14:34
  • 1
    No need for `sed` if you use PowerShell. Use PowerShell instead of `cmd.exe`. – Bill_Stewart Jul 06 '16 at 15:13
  • @shadowfoxy This issue is similar to [gawk command in CMD with && operator not working](http://stackoverflow.com/questions/38170509/). I suggest following not tested with `sed` as not being installed. With delayed expansion enabled by command `setlocal EnableDelayedExpansion` define an environment variable with ``set "Regex='s/[^^a-zA-Z 0-9`~^!@#$%%^^&*()_+\[\]\\{}|;'\'':",.\/^<^>?]//g'"`` and use `sed.exe -c -i -r !Regex! report.csv` before finally using `endlocal`. I get at least the correct command line output with command `echo` inserted left to `sed.exe`. – Mofi Jul 06 '16 at 15:24
  • @shadowfoxy By the way: I don't know if `sed` supports it, but Perl regular expression usually support character and character range definitions in hexadecimal notation like `\x21` for an exclamation mark. For example `[^\t\n\r\x20-\x7E]` matches everything being NOT a horizontal tab, line-feed, carriage return or character in Unicode (and here also ASCII and ANSI) table from space to tilde whereby `[^\t\n\r -~]` would define the same character class. Make the regular expression more simple and no problem anymore with parsing of `cmd.exe`. – Mofi Jul 06 '16 at 15:32

1 Answers1

0

You didn't escape everything properly:

's/[^a-zA-Z 0-9`~!@#$%^&*()_+\[\]\\{}|;'\'':",.\/<>?]//g'
^--start string           end string---^

which makes the \ after that "end string" quote a path separator for cmd.exe, not part of your sed command.

Marc B
  • 356,200
  • 43
  • 426
  • 500
  • Yeah I tried but didn't get anywhere, I escaped what I thought was needed but got the same or different errors, at one point it said Invalid content of \{\} so figured I'd stop and go backward before my sed command got too unrecognizable. Is there any online tool that would help with the escaping and syntax? – shadowfoxy Jul 06 '16 at 14:54
  • cmd.exe uses `^` as the escape character. so.. yeah. good luck... it's ugly enough putting a complex pattern into a bash+sed command line. mixing in cmd's own crazy escaping rules, and you're in for a world of pain. – Marc B Jul 06 '16 at 14:59
  • You can avoid all the pain and at least some of the ugliness by using PowerShell instead. – Bill_Stewart Jul 12 '16 at 15:44