Match a line and return a previous line before the match containing a pattern

Question

I'm writing a bash script for which in a file containing several entries where each entry has this structure:

Id: 33239
Folder: /Contacts/Holder/Center
Date: 04/17/20 13:17
Revision: 34011
Attrs:
  firstName: Name
  lastName: First Second
  mobilePhone: +345555555
  fileAs: 2
  jobTitle: Médico
  company: some company
  email: test_1@somedomain.com

I need to find the "Id" of the Item associated to a specific "email". For doing that, I'm trying to use "sed" with a hold. But I fail to achieve my objective. This is what I have so far, but I'm not getting the results I need.

id=$(grep $usuario -B20 /tmp/contactos \
    | grep "Folder: /Contacts/Holder" -B2 -A20 \
    | sed -n "/^Id: /h;/^  email: $usuario/{g;p;}" \
    | awk '{print $2}')

With this I'm trying to:

id= - assign the value to a variable I will use later on the script

$(grep $usuario -B20 /tmp/contactos - Get all the lines in the file, where the email appears, and also getting 20 lines before it. This is because the email appears associated with more than one Id an undetermined number of lines below the `Id itself.

grep 'Folder: /Contacts/Holder' -B2 -A20 - I filter again, trying to get now only the results for the Ids for that email in a specific "folder path".

sed -n '/^Id: /h;/^ email: $usuario/{g;p;} - This is the part that is not working and I don't know how to fix it. Here, I try to return the line containing the Id: associated with the email. Something like: Id: 33239 in this example.

awk '{print $2}') - Just me trying to print only the number from that line (33239).

Can anyone please help to understand how I can do it with sed` or if any other option is given, it will be also more than welcome :)

Thank you very much!

something like `awk '$1=="email:" && $2=="test_1@somedomain.com"{print id} $1=="Id:"{id = $2}' input_file` ? — Sundeep, Apr 17 '20 at 12:31
This works great! No...I don't feel bad that I wasted 2 hours in something you solved in 2 minutes (you can't hear it but there is a lot of irony in that :P). Could you please let me know how it operates? I think this kung-fu will be very usefull for other things too. Thank you very much! — carrotcakeslayer, Apr 17 '20 at 12:37
well, I could solve it easily because I've come across such a problem before and been using awk for a few years now.. see https://stackoverflow.com/tags/awk/info for learning resources — Sundeep, Apr 17 '20 at 13:45
Please clarify whether the number of *Attr:* items is constant, (there are 7 in the example), or might vary, (*i.e.* more or less than 7). — agc, Apr 17 '20 at 14:26

Benjamin W. · Accepted Answer · 2020-04-17T16:14:19.110

This sed command should extract it:

sed -n '
    /^Id: / {                 # If the line starts with "Id: "
        s///                  # Remove the "Id: "
        h                     # Store what is left in the hold space
    }
    /^  email: '"$email"'/ {  # If the line starts with "  email: " plus the email
        x                     # Swap pattern and hold space
        p                     # Print pattern space
        q                     # Stop processing
    }
' infile

where $email is the shell variable containing the escaped version of test_1@somedomain.com:

raw='test_1@somedomain.com'
email=$(sed 's|[]/.*^$\[]|\\&|g' <<< "$raw")

This escapes the sed special characters .*/^$[]\.

Or, more compact:

sed -n '/^Id: /{s///;h};/^  email: '"$email"'/{x;p;q}' infile

macOS sed requires an extra ; before each closing }.

And yes, it's probably easier with awk

Awesome. Thank you very much!!! This also works like a charm! — carrotcakeslayer, Apr 17 '20 at 17:47

score 3 · Answer 2 · answered Apr 17 '20 at 13:42

awk '$1=="email:" && $2=="test_1@somedomain.com"{print id} $1=="Id:"{id = $2}' input_file

default field separator splits on spaces/tabs/newlines and removes leading/trailing spaces from field contents
$1=="email:" checks if first field content is exactly email: (this is string comparison, not regexp)
$1=="email:" && $2=="test_1@somedomain.com" if both conditions are satisfied, print id variable
$1=="Id:"{id = $2} this saves the id value whenever first field is Id:

Here, I've used hard-coded string value for email to be checked, see this Q&A to know how to pass shell variable

score 2 · Answer 3 · answered Apr 17 '20 at 15:20

In the larger picture, you are trying to chain together conditions on the Folder and Email to produce an Id. So awk is a better choice for solving the whole problem. If your Bash script can prepare the script below, then you can invoke it like this:

id=$(awk -f /tmp/script.awk -v usario=test_1 /tmp/contactos)

Here is the content that your Bash script should write to /tmp/script.awk:

/Id:/   { id=$2; folder="" }
/Folder:..Contacts.Holder/  { folder=$2 }
/email:/    { if (match($2, "^" usario "@") && folder != "") print id }

You should guard against the "matching prefix" problem. Example is to find "juan" without also matching "juanita". That's why the script uses the match() function with a regular expression that evaluates like match($2, "^juan@"). That will match exactly "juan@domain.com" without matching "juanita@domain.com" or "somejuan@domain.com".

Note: The awk syntax concatenates strings and variables that are separated by spaces. It "takes some getting used to" as they say. You can add parenthesis around "^" usario "@" if that helps...

score 1 · Answer 4 · answered Apr 17 '20 at 15:16

Here are two silly pure bash methods, (no external utils), the general method requiring a constant number of field names and attributes, and (for the first method only) a relatively short input file:

printf '%0.0s%s%0.0s %s%0.0s%0.0s%0.0s%0.0s%0.0s%0.0s%0.0s'\
    '%0.0s%0.0s%0.0s%0.0s%0.0s%0.0s%0.0s%0.0s%0.0s%0.0s'\
    '%0.0s%0.0s%0.0s%0.0s %s\n' $(<infile) | 
    while read Id Folder email; do 
        [[ $email == test_1@somedomain.com && 
           $Folder == /Contacts/Holder/Center ]] && 
        echo $Id
    done

How it works: after the printf, what's fed to while looks like:

33239 /Contacts/Holder/Center test_1@somedomain.com

The same thing can be done by using read a lot:

while read a Id && read a Folder && read && read && read &&
      read && read && read && read && read && read && 
      read a email; do
      [[ $email == test_1@somedomain.com &&
         $Folder == /Contacts/Holder/Center ]] &&
      echo $Id
done < infile

Pure shell methods are inefficient of course, but may be useful when the job is trivial and resources are very low, *e.g.* embedded systems. — agc, Apr 17 '20 at 15:21

Match a line and return a previous line before the match containing a pattern

4 Answers4