65

grep can't be fed "raw" strings when used from the command-line, since some characters need to be escaped to not be treated as literals. For example:

$ grep '(hello|bye)' # WON'T MATCH 'hello'
$ grep '\(hello\|bye\)' # GOOD, BUT QUICKLY BECOMES UNREADABLE

I was using printf to auto-escape strings:

$ printf '%q' '(some|group)\n'
\(some\|group\)\\n

This produces a bash-escaped version of the string, and using backticks, this can easily be passed to a grep call:

$ grep `printf '%q' '(a|b|c)'`

However, it's clearly not meant for this: some characters in the output are not escaped, and some are unnecessarily so. For example:

$ printf '%q' '(^#)'
\(\^#\)

The ^ character should not be escaped when passed to grep.

Is there a cli tool that takes a raw string and returns a bash-escaped version of the string that can be directly used as pattern with grep? How can I achieve this in pure bash, if not?

Micha Wiedenmann
  • 19,979
  • 21
  • 92
  • 137
salezica
  • 74,081
  • 25
  • 105
  • 166

6 Answers6

71

If you want to search for an exact string,

grep -F '(some|group)\n' ...

-F tells grep to treat the pattern as is, with no interpretation as a regex.

(This is often available as fgrep as well.)

ephemient
  • 198,619
  • 38
  • 280
  • 391
  • `fgrep` is defined by POSIX, so it should be available, but is technically deprecated. – jordanm Aug 08 '12 at 00:42
  • I tried to make the question clearer, please take another look. – salezica Aug 08 '12 at 00:58
  • 1
    @jordanm A bit harsher than deprecated, even. It's marked LEGACY in POSIX.2, and has not been carried forward to any specification past 1997. – ephemient Aug 08 '12 at 01:02
  • The OP expects `(hello|bye)` to match "hello". So I think this is the answer to the wrong question. – tripleee Aug 08 '12 at 04:58
  • @tripleee That's totally not how I interpreted the question... but I can see that possibility. Well, let's see if your answer is to the right question :) – ephemient Aug 08 '12 at 05:14
  • 2
    That's totally not what I meant :( – salezica Aug 08 '12 at 16:34
  • This doesn't always work. Try e.g: echo "A-B-C" | grep -F "-B-" – LLL Apr 26 '17 at 17:29
  • 1
    @LLL that's because the pattern `-B-` looks like an option (indeed `-B` is a valid grep option), you need to use the `--` special option to terminate option parsing; i.e. `echo A-B-C | grep -F -- -B-` (note that the quotes you used are unnecessary and do nothing in this case, and they are stripped by the shell before invoking grep) – kbolino Apr 03 '18 at 17:44
33

If you are attempting to get grep to use Extended Regular Expression syntax, the way to do that is to use grep -E (aka egrep). You should also know about grep -F (aka fgrep) and, in newer versions of GNU Coreutils, grep -P.

Background: The original grep had a fairly small set of regex operators; it was Ken Thompson's original regular expression implementation. A new version with an extended repertoire was developed later, and for compatibility reasons, got a different name. With GNU grep, there is only one binary, which understands the traditional, basic RE syntax if invoked as grep, and ERE if invoked as egrep. Some constructs from egrep are available in grep by using a backslash escape to introduce special meaning.

Subsequently, the Perl programming language has extended the formalism even further; this regex dialect seems to be what most newcomers erroneously expect grep, too, to support. With grep -P, it does; but this is not yet widely supported on all platforms.

So, in grep, the following characters have a special meaning: ^$[]*.\

In egrep, the following characters also have a special meaning: ()|+?{}. (The braces for repetition were not in the original egrep.) The grouping parentheses also enable backreferences with \1, \2, etc.

In many versions of grep, you can get the egrep behavior by putting a backslash before the egrep specials. There are also special sequences like \<\>.

In Perl, a huge number of additional escapes like \w \s \d were introduced. In Perl 5, the regex facility was substantially extended, with non-greedy matching *? +? etc, non-grouping parentheses (?:...), lookaheads, lookbehinds, etc.

... Having said that, if you really do want to convert egrep regular expressions to grep regular expressions without invoking any external process, try ${regex/pattern/substitution} for each of the egrep special characters; but recognize that this does not handle character classes, negated character classes, or backslash escapes correctly.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • 2
    Nice answer. Regular expressions are a powerful tool, but unfortunately many commands implement them differently. – glenn jackman Jun 11 '13 at 00:54
  • [Why are there so many different regular expression dialects?](https://stackoverflow.com/questions/2298007/why-are-there-so-many-different-regular-expression-dialects) is related, though the answers there are less detailed. – tripleee Jan 20 '22 at 06:04
  • Perhaps see also https://stackoverflow.com/questions/18514135/bash-regular-expression-cant-seem-to-match-any-of-s-s-d-d-w-w-etc which has an answer of mine with workarounds if you are trying to use some PCRE features in Bash (or more generally POSIX regular expressions). – tripleee Jan 20 '22 at 06:07
  • https://stackoverflow.com/a/33908887 has some notes around the regular expression support in Python and hence the lineage from Perl and ultimately Henry Spencer's implementation. – tripleee Jan 20 '22 at 06:09
30

When I use grep -E with user provided strings I escape them with this

ere_quote() {
    sed 's/[][\.|$(){}?+*^]/\\&/g' <<< "$*"
}

example run

ere_quote ' \ $ [ ] ( ) { } | ^ . ? + *'
# output
# \\ \$ \[ \] \( \) \{ \} \| \^ \. \? \+ \*

This way you may safely insert the quoted string in your regular expression.

e.g. if you wanted to find each line starting with the user content, with the user providing funny strings as .*

userdata=".*"
grep -E -- "^$(ere_quote "$userdata")" <<< ".*hello"
# if you have colors in grep you'll see only ".*" in red
Socowi
  • 25,550
  • 3
  • 32
  • 54
Riccardo Galli
  • 12,419
  • 6
  • 64
  • 62
  • The character class/set in your ere_quote function is missing the "/" character, as a result it won't escape it. – fholzer Oct 21 '20 at 08:21
  • @fholzer why should "/" be quoted? It has no special meaning in a regexp. If you are using it as delimiter you can use different delimiters, or escape it, but that is different from being evalutated as part of a regexp – Riccardo Galli Oct 26 '20 at 00:34
  • While technically correct, my assumption was that the output of the `ere_quote` function would subsequently used in e.g. sed. While one could use a different delimiter, whatever delimiter would be chosen would again need to be escaped. So, true, while technically the slash holds no special meaning in regex in general, it might be worth noting that when `ere_quote` output is used later in the script with certain tools, it would make sense to amend the character class with the chosen delimiter of those tools, if necessary. – fholzer Oct 26 '20 at 12:39
7

I think that previous answers are not complete because they miss one important thing, namely string which begin with dash (-). So while this won't work:

echo "A-B-C" | grep -F "-B-"

This one will:

echo "A-B-C" | grep -F -- "-B-"
LLL
  • 1,777
  • 1
  • 15
  • 31
  • `grep` has the `-e` option precisely so you can unambiguously pass in a pattern which starts with a dash. This has nothing to do with regex syntax per se. – tripleee Oct 18 '20 at 19:12
2
quote() {
    sed 's/[^\^]/[&]/g;s/[\^]/\\&/g' <<< "$*"
}

Usage: grep [OPTIONS] "$(quote [STRING])"

This function has some substantial benefits:

  • quote is independent from the regex flavor. You can use quote's output in
    • grep (-G)` (BRE, the default)
    • grep -E (ERE)
    • grep -P (PCRE)
    • sed (-E) "s/$(quote [STRING])/.../" (as long as you don't use \, [, or ] instead of /).
  • quote even works in corner cases that are not directly quoting related, for instance
    • Leading - are quoted so that they aren't misinterpreted as options by grep.
    • Trailing spaces are quoted so that the aren't removed by $(...).

quote only fails if [STRING] contains linebreaks. But in general there is no fix for this since tools like grep and sed may not support linebreaks in their search pattern (even if they are written as \n).

Also, there is the drawback that the quoted output usually is three times longer than the unquoted input.

Socowi
  • 25,550
  • 3
  • 32
  • 54
0

Just want to comment example below which shows that substring "-B" is iterpreted by grep as a command line option and the command failed.

echo "A-B-C" | grep -F "-B-"

grep has a special option for this case:

-e PATTERNS, --regexp=PATTERNS Use PATTERNS as the patterns. If this option is used multiple times or is combined with the -f (--file) option, search for all patterns given. This option can be used to protect a pattern beginning with “-”.

So a fix for the issue is:

echo "A-B-C" | grep -F -e "-B-" -
trunikov
  • 97
  • 1
  • 4