save multiple matches in a list (grep or awk)

Question

I have a file that looks something like this:

# a mess of text
Hello. Student Joe Deere has
id number 1. Over.
# some more messy text
Hello. Student Steve Michael Smith has
id number 2. Over.
# etc.

I want to record the pairs (Joe Deere, 1), (Steve Michael Smith, 2), etc. into a list (or two separate lists with the same order). Namely, I will need to loop over those pairs and do something with the names and ids.

(names and ids are on distinct lines, but come in the order: name1, id1, name2, id2, etc. in the text). I am able to extract the lines of interest with

VAR=$(awk '/Student/,/Over/' filename.txt)

I think I know how to extract the names and ids with grep, but it will give me the result as one big block like

`Joe Deere 1 Steve Michael Smith 2 ...`

(and maybe even with a separator between names and ids). I am not sure at this point how to go forward with this, and in any case it doesn't feel like the right approach.

I am sure that there is a one-liner in awk that will do what I need. The possibilities are infinite and the documentation monumental.

Any suggestion?

score 2 · Answer 1 · answered Nov 03 '18 at 12:36

2

$ cat tst.awk
/^id number/ {
    gsub(/^([^ ]+ ){2}| [^ ]+$/,"",prev)
    printf "(%s, %d)\n", prev, $3
}
{ prev = $0 }

$ awk -f tst.awk file
(Joe Deere, 1)
(Steve Michael Smith, 2)

answered Nov 03 '18 at 12:36

Ed Morton

188,023
17
78
185

@Ed Morton Well, it will take me a little while to figure out exactly what this is doing! But at least I obtain the same result. I still have a bit of work to do. If I store the result using `VAR=$(awk -f tst.awk file)`, then `echo $VAR` returns `(Joe Deere, 1) (Steve Smith , 2)` (it seems end of lines are converted to white spaces). Then using [this other pose](https://stackoverflow.com/questions/10586153/split-string-into-an-array-in-bash) I should be able to work things out. – Antoine Nov 03 '18 at 13:25
You need to learn about quoting in shell ASAP as it's crucial that you understand the implications of improper quoting. You want `echo "$VAR"`, not `echo $VAR`. The former passes the contents of `"$VAR"` to `echo` as one argument while the latter chops up the contents of $VAR at every space, does globbing and file name expansion on them, and then passes the result one "word" at a time to echo. A huge difference. Always quote shell variables unless you have a specific reason not to and fully understand all of the implications. Don;'t use all upper case for non-exported variable names btw. – Ed Morton Nov 03 '18 at 13:26

score 1 · Answer 2 · answered Nov 03 '18 at 12:42

1

Could you please try following too.

awk '
/id number/{
  sub(/\./,"",$3)
  print val", "$3
  val=""
  next
}
{
  gsub(/Hello\. Student | has.*/,"")
  val=$0
}
'  Input_file

answered Nov 03 '18 at 12:42

RavinderSingh13

130,504
14
57
93

Again, I will need some time to decipher what your solution does. I observe that the output doesn't capture the full name: some names are first name + last name, others are fist name + middle name + last name. Your solution is missing the middle name "Michael"... And as above, I still need to convert the result into an array. But thanks! – Antoine Nov 03 '18 at 13:26
@Antoine, I get `Joe Deere, 1 Steve Michael Smith, 2 ` from above code, please do mention where it is NOT working? – RavinderSingh13 Nov 03 '18 at 13:40
Sorry! You are right (I was testing with the wrong file, one which didn't have a middle name "Michael" so I thought your code didn't capture it). I get the same result as yours. – Antoine Nov 03 '18 at 13:53
@Antoine, always try to encourage people who are helping you by up-voting their answers and try to select a correct answer out of all answers. – RavinderSingh13 Nov 03 '18 at 13:56
You make a good point - and I upvoted the two answers that worked for me. That being said, I like to understand the solutions that I am given in order to judge how good they are. Even though an answer may work when I try it, what if it turns out to be a bad solution for some reason that I do not have the capacity to appreciate at this point? Don't take it personally, and don't think it is negligence or lack of appreciation for people's help! I am quite principled, I guess, and I often prefer to err on the side of safety. – Antoine Nov 03 '18 at 16:59
1

@Antoine you're always free to ask question about anything you don't understand. On our end, we're not going to document every tiny little script using common language constructs but we generally don;'t mind answering question for anyone who's put a little effort into researching the constructs themselves and just needs a bit more help piecing it together. – Ed Morton Nov 03 '18 at 18:08

score 0 · Answer 3 · answered Nov 03 '18 at 12:57

0

grep -oP 'Hello. Student \K.+(?= has)|id number \K\d+' file | paste - -

answered Nov 03 '18 at 12:57

glenn jackman

238,783
38
220
352

My version of grep doesn't support `-P`... Still, I will learn something by trying to understand what your code does! – Antoine Nov 03 '18 at 13:29

save multiple matches in a list (grep or awk)

3 Answers3