2

Let's say my text file is like this

Person1 : movie1
(space and tab) : movie 2
(space and tab) : movie 3
(space and tab) : movie 4

I want to find for a particular movie, the actor. So here is how I am going about doing this.

Do a grep cat actors | grep 'movie3'

This will give me line 3 which is an empty line up unitl movie3 appears. So if somehow I can get the first line before this particular line which follows this pattern

grep '^[^ \t].'(does not start with a space)

it has to be the line with the actor's name in this movie.(I don't care about movie one there)

Is there any combination of sed/grep/awk which can help me do it in shell? I hope the question is clear.

Max
  • 9,100
  • 25
  • 72
  • 109

4 Answers4

3

Bill Murray <- Groundhog Day <- grep with Perl mode Magic

It's a bit tricky, but you can use this:

grep -P "(?sm)^\S+[^:\r\n]*?(?=\s*:(?:(?!^\S).)*?Groundhog Day)" mymoviefile

See demo.

  • -P activates Perl mode
  • (?sm) turns on two mode modifiers:
  • s activates DOTALL mode, allowing the dot to match across lines
  • m turns on multi-line mode, allowing ^ and $ to match on each line
  • The ^ anchor asserts that we are at the beginning of the line
  • \S+ matches one or more non-space chars
  • [^:\r\n]*? lazily matches any non-colon, non-newline chars, up to ...
  • the point where the lookahead (?=\s*:(?:(?!^\S).)*?Groundhog Day) can assert, without consuming chars, that what follows is...
  • \s*: optional spaces and a colon
  • then (?:(?!^\S).)* zero or more chars that are not a non-space char at the beginning of a line, lazily matching up to...
  • Groundhog Day the movie title!

Reference

zx81
  • 41,100
  • 9
  • 89
  • 105
  • I tried running it. It did not work. Here is the error message grep: unrecognized character after (? or (?-. I am trying to debug it, but since It is very complex, and I don't known of half the things you have used here, I think I will need your further help. :^D – Max Jun 29 '14 at 11:03
  • Added tweak and tweak, have a look. :) – zx81 Jun 29 '14 at 11:04
  • Thanks for your help.But it is definitely not for the faint hearted. – Max Jun 29 '14 at 11:10
  • Finished the explanation. ` it is definitely not for the faint hearted` You're right, it's far from obvious, but with the explanation I'm sure you'll be able to understand it. Is it working? – zx81 Jun 29 '14 at 11:13
  • After that explanation, I actually owe you 50-60 reputation at least! :) – Max Jun 29 '14 at 11:25
  • Nah, it was a real pleasure, you're most welcome! :) If you want to do me (or you) a favor, go learn some more cool regex! :) For instance there are a few interesting questions in the right pane of my profile, the [regex FAQ](http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean/22944075#22944075) is also good, then answers by some of the regex gods here (click on top users of all time in the regex tag), or sites like regular-expressions.info and rexegg... Regex is cool, Dude! :) – zx81 Jun 29 '14 at 11:29
3

I would do it with awk if I unserstood the problem right:

 awk -F: -v s="$search" '$1~/\S/{p=$1}$2~s{print $1 FS $2}' file

test with movie 3:

kent$ cat f
Person1 : movie1
          : movie 2
          : movie 3
          : movie 4

in above file, there are leading spaces/tabs

kent$  awk -F: -v s="movie 3" '$1~/\S/{p=$1}$2~s{print p FS $2}' f
Person1 : movie 3
Kent
  • 189,393
  • 32
  • 233
  • 301
  • I created a file just like yours, no leading space in the line with person1: movie1. and I ran the exact command, you gave me. It gave just this, (start of line):movie 3. – Max Jun 29 '14 at 11:29
  • I am on linux. It is expected to work there, in case you ran it on mac? – Max Jun 29 '14 at 11:39
  • @Dude I only have linux. I guess because your gawk version is lower than mine, you could try: `awk ... '$1~/[^ \t]/{....}'` – Kent Jun 29 '14 at 12:12
  • Yup it worked. If you don't mind, could you please explain the regex briefly. – Max Jun 29 '14 at 12:18
  • 1
    @Dude the regex is just matching a string ($1, the first column) if it contains any non-empty char. The problem like that is typical for awk. grep is great, but here it is not the right tool for it.(my opinion) – Kent Jun 29 '14 at 12:19
2

This might work for you (GNU sed):

sed -n '/^\S/h;/movie 3/{H;x;s/:.*:/:/p}' file

Use the -n switch to provide grep like nature. Save the person in the hold space and append the movie to it. Then remove unwanted text and print out.

potong
  • 55,640
  • 6
  • 51
  • 83
0

This is a bit obscure but get the job done:

awk '/^[^ ]/{p=0} /Person1/{p=1} p'

Example:

Input file:

Person1 : movie1
    : movie 2
    : movie 3
    : movie 4
Person2 : movie 5
    : movie 6

Execution:

awk '/^[^ ]/{p=0} /Person1/{p=1} p' file
Person1 : movie1
    : movie 2
    : movie 3
    : movie 4

awk '/^[^ ]/{p=0} /Person2/{p=1} p' file
Person2 : movie 5
    : movie 6

OBS: In the command line the output is indented.

Explanation:

  1. If the line does not start with space, sets p=0
  2. If the line contains Person1 sets p=1
  3. if p=1 then print (This part is obscure)

Can be done in perl too:

perl -ne '/^\w+/ && {$p=0}; /Person1/ && {$p=1}; $p && {print}' 
Tiago Lopo
  • 7,619
  • 1
  • 30
  • 51