Grep characters before and after match?

Question

Using this:

grep -A1 -B1 "test_pattern" file

will produce one line before and after the matched pattern in the file. Is there a way to display not lines but a specified number of characters?

The lines in my file are pretty big so I am not interested in printing the entire line but rather only observe the match in context. Any suggestions on how to do this?

Duplicate of https://unix.stackexchange.com/q/163726 Near duplicate of https://stackoverflow.com/q/2034799 — sondra.kinsey, Sep 15 '19 at 17:29

score 275 · Accepted Answer · answered Nov 12 '11 at 01:19

275

3 characters before and 4 characters after

$> echo "some123_string_and_another" | grep -o -P '.{0,3}string.{0,4}'
23_string_and

answered Nov 12 '11 at 01:19

ДМИТРИЙ МАЛИКОВ

21,474
11
78
131

7

A good answer for small amounts of data, but it starts getting slow when you are matching >100 characters - e.g. in my giant xml file, I want {1,200} before and after, and it is too slow to use. – Benubird Oct 18 '13 at 11:27
4

The awk version by @amit_g is much faster. – ssobczak Jul 04 '14 at 12:46
12

Not available on Mac OSX, so really this is not a widely available solution. The -E version (listed below) is a better solution. What is -P? Read on ... -P, --perl-regexp Interpret PATTERN as a Perl regular expression (PCRE, see below). This is highly experimental and grep -P may warn of unimplemented features. – Xofo Nov 19 '14 at 23:50
Inexplicably, for me, this prints a certain number of lines of beautiful output, then says "Aborted", every time the same number of lines, which depends on what I'm searching for, but is never the full number of matches, by far. bash 4.1.2(1) and grep 2.6.3, CentOS 6.5. – Kev Jul 23 '15 at 13:09
The -E version below does not have this trouble, for some reason. Also, if I search for something that doesn't exist, I get only the `Aborted` line. – Kev Jul 23 '15 at 13:17
3

On OSX install via: `brew install homebrew/dupes/grep` and run it as `ggrep`. – kenorb Dec 21 '15 at 22:01
1

As implied by @Benubird this will be performance-wise impossible to use for huge files with moderately wide surroundings desired for the match target. – matanster Jan 06 '18 at 09:43
not working for me bash-5.1$ echo "some123_string_and_another" | grep -o -P '.{0,3}string.{0,4}' grep: unrecognized option: P – GKP Mar 24 '22 at 20:54
Thank you for this! I wrote a quick script "grep_4CharsSurroundingResults.sh" so I never forget again :) (Sorry for formatting) #!/bin/bash grep -o -P '.{0,3}'$1'.{0,4}' $2 ''' – Jay Aug 25 '23 at 18:44

score 167 · Answer 2 · answered Nov 12 '11 at 01:26

167

grep -E -o ".{0,5}test_pattern.{0,5}" test.txt

This will match up to 5 characters before and after your pattern. The -o switch tells grep to only show the match and -E to use an extended regular expression. Make sure to put the quotes around your expression, else it might be interpreted by the shell.

answered Nov 12 '11 at 01:26

ekse

1,825
1
10
6

3

Good answer, interesting that it's capped at 2^8-1 for length in the {} so `{0,255}` works `{0,256}` gives `grep: invalid repetition count(s)` – CodeMonkey Apr 05 '18 at 23:02
3

This seems to get considerably less performant as I increase the number of matching chars (5 -> 25 ->50), any idea why? – Adam Hughes Jan 09 '20 at 17:15

amit_g · Answer 3 · 2011-11-12T01:35:15.497

49

You could use

awk '/test_pattern/ {
    match($0, /test_pattern/); print substr($0, RSTART - 10, RLENGTH + 20);
}' file

edited Nov 12 '11 at 01:35

answered Nov 12 '11 at 01:17

amit_g

30,880
8
61
118

3

Works nicely even with somewhat bigger files also – Touko Mar 24 '15 at 08:02
6

how can you use this to find multiple matches per line? – koox00 Jun 02 '16 at 11:38
2

What's the significance of the first number in the curly-bracketed pairs? Like the 0s in "grep -E -o ".{0,5}test_pattern.{0,5}" test.txt "? – Lew Rockwell Fan Jun 23 '17 at 02:28
It's really faster but not as accurate as @ekse's answer. – Abdollah Aug 11 '19 at 11:57
It's not at all accurate for large files. For a 5.5GB file that I'm sure has millions of matches, this command returned one result. – duplex143 Jun 07 '23 at 14:54

score 33 · Answer 4 · answered Nov 12 '11 at 01:20

33

You mean, like this:

grep -o '.\{0,20\}test_pattern.\{0,20\}' file

?

That will print up to twenty characters on either side of test_pattern. The \{0,20\} notation is like *, but specifies zero to twenty repetitions instead of zero or more.The -o says to show only the match itself, rather than the entire line.

answered Nov 12 '11 at 01:20

ruakh

175,680
26
273
307

This command is not working for me: `grep: Invalid content of \{\}` – Alexander Pravdin Mar 09 '17 at 02:37
@AlexanderPravdin I think he's assuming that grep is is BRE(so, no -E no -P). `echo zzzabczzzz | grep -o '.\{0,20\}abc.\{0,20\}'` If it's ERE then the syntax is easier `echo zzzabczzzz | grep -o -E '.{0,20}abc.{0,20}'` likewise if it's PCRE then it's syntax as with ERE. `echo zzzabczzzz | grep -o -P '.{0,20}abc.{0,20}'` You can also do `echo zzzabczzzz | grep -o -P '.abc..'` adding or removing as many dots as you want – barlop Aug 22 '22 at 13:33

score 3 · Answer 5 · edited Sep 04 '21 at 07:23

I'll never easily remember these cryptic command modifiers so I took the top answer and turned it into a function in my ~/.bashrc file:

cgrep() {
    # For files that are arrays 10's of thousands of characters print.
    # Use cpgrep to print 30 characters before and after search pattern.
    if [ $# -eq 2 ] ; then
        # Format was 'cgrep "search string" /path/to/filename'
        grep -o -P ".{0,30}$1.{0,30}" "$2"
    else
        # Format was 'cat /path/to/filename | cgrep "search string"
        grep -o -P ".{0,30}$1.{0,30}"
    fi
} # cgrep()

Here's what it looks like in action:

$ ll /tmp/rick/scp.Mf7UdS/Mf7UdS.Source

-rw-r--r-- 1 rick rick 25780 Jul  3 19:05 /tmp/rick/scp.Mf7UdS/Mf7UdS.Source

$ cat /tmp/rick/scp.Mf7UdS/Mf7UdS.Source | cgrep "Link to iconic"

1:43:30.3540244000 /mnt/e/bin/Link to iconic S -rwxrwxrwx 777 rick 1000 ri

$ cgrep "Link to iconic" /tmp/rick/scp.Mf7UdS/Mf7UdS.Source

1:43:30.3540244000 /mnt/e/bin/Link to iconic S -rwxrwxrwx 777 rick 1000 ri

The file in question is one continuous 25K line and it is hopeless to find what you are looking for using regular grep.

Notice the two different ways you can call cgrep that parallels grep method.

There is a "niftier" way of creating the function where "$2" is only passed when set which would save 4 lines of code. I don't have it handy though. Something like ${parm2} $parm2. If I find it I'll revise the function and this answer.

score 1 · Answer 6 · edited Oct 06 '22 at 09:24

1

If using ripgreg this is how you would do it:

grep -E -o ".{0,5}test_pattern.{0,5}" test.txt

edited Oct 06 '22 at 09:24

Andreas Louv

46,145
13
104
123

answered Sep 30 '22 at 10:15

Jeff

19
1

You meant ripgrep, I suppose. I'd like to know, how is it different from grep? Your answer seems to be the exact same answer as ekse, except for the ripgrep specification. – chrslg Oct 07 '22 at 08:56

P.... · Answer 7 · 2017-03-28T12:49:44.573

With gawk , you can use match function:

    x="hey there how are you"
    echo "$x" |awk --re-interval '{match($0,/(.{4})how(.{4})/,a);print a[1],a[2]}'
    ere   are

If you are ok with perl, more flexible solution : Following will print three characters before the pattern followed by actual pattern and then 5 character after the pattern.

echo hey there how are you |perl -lne 'print "$1$2$3" if /(.{3})(there)(.{5})/'
ey there how

This can also be applied to words instead of just characters.Following will print one word before the actual matching string.

echo hey there how are you |perl -lne 'print $1 if /(\w+) there/'
hey

Following will print one word after the pattern:

echo hey there how are you |perl -lne 'print $2 if /(\w+) there (\w+)/'
how

Following will print one word before the pattern , then the actual word and then one word after the pattern:

echo hey there how are you |perl -lne 'print "$1$2$3" if /(\w+)( there )(\w+)/'
hey there how

score 0 · Answer 8 · answered Jun 14 '22 at 21:10

With ugrep you can specify -ABC context with option -o (--only-matching) to show the match with extra characters of context before and/or after the match, fitting the match plus the context within the specified -ABC width. For example:

ugrep -o -C30 pattern testfile.txt

gives:

     1: ... long line with an example pattern to match.  The line could...
     2: ...nother example line with a pattern.

The same on a terminal with color highlighting gives: Multiple matches on a line are either shown with [+nnn more]: or with option -k (--column-number) to show each individually with context and the column number: The context width is the number of Unicode characters displayed (UTF-8/16/32), not just ASCII.

barlop · Answer 9 · 2022-08-22T13:53:05.180

I personally do something similar to the posted answers.. but since the dot key, like any keyboard key, can be tapped or held down.. and I often don't need a lot of context(if I needed more I might do the lines like grep -C but often like you I don't want lines before and after), so I find it much quicker for entering the command, to just tap the dot key for how many dots / how many characters, if it's a few then tapping the key, or hold it down for more.

e.g. echo zzzabczzzz | grep -o '.abc..'

Will have the abc pattern with one dot before and two after. ( in regex language, Dot matches any character). Others used dot too but with curly braces to specify repetition.

If I wanted to be strict re between (0 or x) characters and exactly y characters, then i'd use the curlies.. and -P, as others have done.

There is a setting re whether dot matches new line but you can look into that if it's a concern/interest.

score -1 · Answer 10 · answered Jul 29 '19 at 10:27

-1

You can use regexp grep for finding + second grep for highlight

echo "some123_string_and_another" | grep -o -P '.{0,3}string.{0,4}' | grep string

23_string_and

answered Jul 29 '19 at 10:27

Andrew Zhilin

1,654
16
11

Grep characters before and after match?

10 Answers10

Linked

Related