6

Say I have this text-file (lorem.txt):

Lorem ipsum dolor sit amet, consectetur
adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna
aliqua.

If I use grep I can now easily find the row containing eiusmod by:

$ grep eiusmod lorem.txt
adipiscing elit, sed do eiusmod tempor

By using some sort of context-switch like -C I can even get the lines surrounding the match:

$ grep -C1 eiusmod lorem.txt
Lorem ipsum dolor sit amet, consectetur
adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna

This is good. But what if I just want to see some of the characters closest to the match on the same line? Not the full line. So a behaviour like this:

$ grep --char-context=3 eiusmod lorem.txt
do eiusmod te
$ grep -n --char-context=5 dol lorem.txt
1:psum dolor si
3:e et dolore m

I could of course do this with some clever sed, awk or other tool:

$ sed -n '/dol/{=;s/.*\(...dol...\).*/\1/p}' lorem.txt | sed 'N;s/\n/:o/'
1:um dolor 
3:et dolore

But that is not what I want. It's too complicated and obscure to be usable on a day-to-day basis. So is there a simpler way or tool to achieve this?

This is mainly a problem when doing recursive grep over files with long lines like minified css or other files with long texts without newlines. I first started thinking about this when using git grep so a solution usable both for plain grep and git grep is preferred.

Note also that a grep-pipe-sed construct is undesirable since that will remove any highlight/colorisation of the match.

Dima Chubarov
  • 16,199
  • 6
  • 40
  • 76
UlfR
  • 4,175
  • 29
  • 45
  • 7
    I suspect you ask about range quantifiers, e.g. `grep -o '.\{0,3\}eiusmod.\{0,3\}' lorem.txt`, see [this `grep` demo](https://ideone.com/ZIs8W1) – Wiktor Stribiżew Aug 29 '19 at 13:08
  • I did not find the `-o` option earlier. Perfect. Its close enough for me. But its not working on `git grep` I think. – UlfR Aug 29 '19 at 13:12
  • @UlfR the suggestion by Wiktor prints the match and three characters of context on either side. How is this not what you wanted? Oh, I see, you want the context to not be coloured! – joanis Aug 29 '19 at 13:15
  • 5
    How about `grep -o '.\{0,3\}eiusmod.\{0,3\}' | grep --color eiusmod`? – joanis Aug 29 '19 at 13:17
  • 1
    Did you try `git grep -E --all-match '.{0,3}eiusmod.{0,3}' lorem.txt` or `git grep -E --all-match '.{0,3}eiusmod.{0,3}' lorem.txt | grep --color eiusmod`? – Wiktor Stribiżew Sep 16 '19 at 12:17
  • Are you still looking for solutions beyond what suggested by Wiktor ? – dash-o Nov 22 '19 at 15:02
  • @dash-o it would be nice to wrap it up in an alias both for `grep` and `git grep`. But I haven't been able to get that working, so please give it a go. – UlfR Nov 22 '19 at 15:47

2 Answers2

1

Solution based on Wiktor Stribiżew comment above.

Possible to create 'grep-cxt', which will take 2 mandatory parameters (# of characters around pattern, pattern) and optional list of files (default: stdin).

#! /bin/bash
count=$1
pattern=$2
shift
shift
grep -E --all-match ".{0,$count}$pattern.{0,$count}" "$@"
dash-o
  • 13,723
  • 1
  • 10
  • 37
1
grep -noE '.{,4}dolor.{,4}' lorem.txt

It returns:

1:sum dolor sit
3: et dolore ma
Dmitriy
  • 21
  • 2