17

I am using the Unix sed command on a string that can contain all types of characters (&, |, !, /, ?, etc).

Is there a complex delimiter (with two characters?) that can fix the error:

sed: -e expression #1, char 22: unknown option to `s'
Unitech
  • 5,781
  • 5
  • 40
  • 47

8 Answers8

24

The characters in the input file are of no concern - sed parses them fine. There may be an issue, however, if you have most of the common characters in your pattern - or if your pattern may not be known beforehand.

At least on GNU sed, you can use a non-printable character that is highly improbable to exist in your pattern as a delimiter. For example, if your shell is Bash:

$ echo '|||' | sed s$'\001''|'$'\001''/'$'\001''g'

In this example, Bash replaces $'\001' with the character that has the octal value 001 - in ASCII it's the SOH character (start of heading).

Since such characters are control/non-printable characters, it's doubtful that they will exist in the pattern. Unless, that is, you are doing something weird like modifying binary files - or Unicode files without the proper locale settings.

thkala
  • 84,049
  • 23
  • 157
  • 201
  • Non printable caracters can appears in the search-pattern, but thank for the tip – Unitech Jan 30 '11 at 19:29
  • 1
    “Highly improbable” isn’t the same as impossible. You cannot do matches in sed without *some* delimiter, which will always cause a problem if that delimiter is in the pattern. In Perl, though, you can. – tchrist Jan 30 '11 at 19:34
  • This works very well. @Tknew - What type of text are dealing with? This solution is nearly perfect most data sets. Are you dealing with text or binary? The null byte is safe for processing file names, but for actual text data there really is no safe character that you can guarantee more than the non-printables. I have a solution to this problem in a script I wrote, that involves searching and replacing the text until a unique sentinel is found. That is all I know of that can be done for this problem at the moment, if you want you can check the logic from my script: http://goo.gl/0bypeu –  Mar 08 '15 at 20:25
  • 2
    @thkala, you can simplify and generalize it by using double quotes and variables, note that the LHS regex / / prior to a command on my sed needs to be escaped, so you need to still bring that outside like you have it.. E.g. `STATEMENT="THIS STATEMENT NEEDS THIS / AND NOTHING ELSE REMOVED"; DELIM=$(echo -en "\001"); echo "$STATEMENT" | sed $'\x5c'"${DELIM}^THIS STATEMENT${DELIM}{s${DELIM}THIS / AND ${DELIM}${DELIM}}"` –  Mar 08 '15 at 20:51
2

Another way to do this is to use Shell Parameter Substitution.

${parameter/pattern/replace}  # substitute replace for pattern once

or

${parameter//pattern/replace}  # substitute replace for pattern everywhere

Here is a quite complex example that is difficult with sed:

$ parameter="Common sed delimiters: [sed-del]"
$ pattern="\[sed-del\]"
$ replace="[/_%:\\@]"
$ echo "${parameter//$pattern/replace}"

result is:

Common sed delimiters: [/_%:\@]

However: This only work with bash parameters and not files where sed excel.

javabeangrinder
  • 6,939
  • 6
  • 35
  • 38
  • Down vote for really late tag-on-for-rep irrelevant answer that ignores the question explicitly requiring use of sed and file input. – Gordon Apr 20 '16 at 18:51
  • 1
    Well Gordon, good for you. However I got to this page when having a sed problem similar to the one presented here. This page gave me no solution to my problem but the parameter substitution did. Since I believe that others might get to this page not knowing that parameter substitution could be the answer to their problem I consider my answer a contribution. In my humble opinion stackoverflow isn't here just to answer those who ask the question but also others that hunt for a solution to a specific problem. My answer is just that. – javabeangrinder Apr 21 '16 at 10:22
  • Doesn't change that this is not an answer to the question here. FYI, you can create your own question and answer it yourself for the sake of sharing a solution you found to some problem when existing questions don't apply. – Gordon Apr 23 '16 at 03:48
  • Thank you @javabeangrinder. I was stuck thinking I needed sed and realized I could do better without it. I too believe you anwer is very fitting here. Maybe it's not exactly what OP was asking but it is very much related and addresses very similar use cases. – joseLuís Nov 20 '16 at 21:12
  • 1
    @javabeangrinder I like your answer, thank you. I have edited a typo in it – maoizm Aug 05 '18 at 17:28
  • While we're slightly off-topic... Technically, this _can_ be used for files as well: `while read; do printf "${REPLY//a/b}\n"; done outfile` – dannysauer Mar 16 '21 at 18:26
1

Wow. I totally did not know that you could use any character as a delimiter. At least half the time I use the sed and BREs its on paths, code snippets, junk characters, things like that. I end up with a bunch of horribly unreadable escapes which I'm not even sure won't die on some combination I didn't think of. But if you can exclude just some character class (or just one character even)

echo '#01Y $#1+!' | sed -e 'sa$#1+acowa' -e 'su#01YuHolyug'

> > > Holy cow! That's so much easier.

iownbey
  • 69
  • 5
Geoff Nixon
  • 4,697
  • 2
  • 28
  • 34
1

Escaping the delimiter inline for BASH to parse is cumbersome and difficult to read (although the delimiter does need escaping for sed's benefit when it's first used, per-expression).

To pull together thkala's answer and user4401178's comment:

DELIM=$(echo -en "\001");
sed -n "\\${DELIM}${STARTING_SEARCH_TERM}${DELIM},\\${DELIM}${ENDING_SEARCH_TERM}${DELIM}p" "${FILE}"

This example returns all results starting from ${STARTING_SEARCH_TERM} until ${ENDING_SEARCH_TERM} that don't match the SOH (start of heading) character with ASCII code 001.

Community
  • 1
  • 1
Andy
  • 17,423
  • 9
  • 52
  • 69
1

There is no such option for multi-character expression delimiters in sed, but I doubt you need that. The delimiter character should not occur in the pattern, but if it appears in the string being processed, it's not a problem. And unless you're doing something extremely weird, there will always be some character that doesn't appear in your search pattern that can serve as a delimiter.

Jim Lewis
  • 43,505
  • 7
  • 82
  • 96
  • 3
    I'm doing something extremely weird, yes. I'm testing all type of caracters. – Unitech Jan 30 '11 at 19:22
  • @tknew: Only Perl but not sed offers matches that are independent of a delimiter. Since Perl is a proper superset of sed, this may suffice. – tchrist Jan 30 '11 at 19:33
  • I ran into this problem the other day, and I don't think what I'm doing is extremely weird: I was trying to delete a line containing an arbitrary string `$STR`, e.g. `sed -i -e '/'"$STR"'/d' $FILE`. Or is there a better idiom for the above? – Leo Alekseyev Feb 11 '11 at 06:30
1

You need the nested delimiter facility that Perl offers. That allows to use stuff like matching, substituting, and transliterating without worrying about the delimiter being included in your contents. Since perl is a superset of sed, you should be able to use it for whatever you’re used sed for.

Consider this:

$ perl -nle 'print if /something/' inputs

Now if your something contains a slash, you have a problem. The way to fix this is to change delimiter, preferably to a bracketing one. So for example, you could having anything you like in the $WHATEVER shell variable (provided the backets are balanced), which gets interpolated by the shell before Perl is even called here:

 $ perl -nle "print if m($WHATEVER)" /usr/share/dict/words

That works even if you have correctly nested parens in $WHATEVER. The four bracketing pairs which correctly nest like this in Perl are < >, ( ), [ ], and { }. They allow arbitrary contents that include the delimiter if that delimiter is balanced.

If it is not balanced, then do not use a delimiter at all. If the pattern is in a Perl variable, you don’t need to use the match operator provided you use the =~ operator, so:

$whatever = "some arbitrary string ( / # [ etc";
if ($line =~ $whatever) { ... }
tchrist
  • 78,834
  • 30
  • 123
  • 180
1

With the help of Jim Lewis, I finally did a test before using sed :

if [ `echo $1 | grep '|'` ]; then
    grep ".*$1.*:" $DB_FILE  | sed "s@^.*$1*.*\(:\)@@ "
else
    grep ".*$1.*:" $DB_FILE  | sed "s|^.*$1*.*\(:\)|| "
fi

Thanks for help

Unitech
  • 5,781
  • 5
  • 40
  • 47
  • Yeah I was gonna post a solution that does logic of this sort: basically, you could make a loop over all characters that will semi-exhaustively determine the first character not present in your proposed search string, so then the only case it would fail is if your search string contained *all* possible characters, which is a pretty absurd situation. – Steven Lu Feb 06 '13 at 04:58
1

There's no universal separator, but it can be escaped by a backslash for sed to not treat it like separator (at least unless you choose a backslash character as separator).

Depending on the actual application, it might be handy to just escape those characters in both pattern and replacement.

If you're in a bash environment, you can use bash substitution to escape sed separator, like this:

safe_replace () {
    sed "s/${1//\//\\\/}/${2//\//\\\/}/g"
}

It's pretty self-explanatory, except for the bizarre part. Explanation to that:

${1//\//\\\/}
${            - bash expansion starts
  1           - first positional argument - the pattern
   //         - bash pattern substitution pattern separator "replace-all" variant
     \/       - literal slash
       /      - bash pattern substitution replacement separator
        \\    - literal backslash
          \/  - literal slash
            } - bash expansion ends

example use:

$ input="ka/pus/ta"
$ pattern="/pus/"
$ replacement="/re/"
$ safe_replace "$pattern" "$replacement" <<< "$input"
ka/re/ta
Krzysztof Jabłoński
  • 1,890
  • 1
  • 20
  • 29