1

I'm having trouble understanding this awk code:

$0 ~ ENVIRON["search"] {
  match($0, /id=[0-9]+/);
  if (RSTART) {
    print substr($0, RSTART+3, RLENGTH-3)
  }
}
  1. How do the ~ and match() operators interact with each other?

  2. How does the match() have any effect, if its output isn't printed or echo'd? What does it actually return or do? How can I use it in my own code?

This is related to Why are $0, ~, &c. used in a way that violates usual bash syntax docs inside an argument to awk?, but that question was centered around understanding the distinction between bash and awk syntaxes, whereas this one is centered around understanding the awk portions of the script.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
rado
  • 5,720
  • 5
  • 29
  • 51
  • 1
    What do you mean `into an algorithm`, do you mean into psuedocode? Or are you just asking how it works? You may want to try reading the manual. – 123 May 24 '17 at 20:02
  • 1
    You were already told - that's not bash, it's awk. Read Effective Awk Programming, 4th Edition, by Arnold Robbins. – Ed Morton May 24 '17 at 20:08
  • 1
    Answering specifically the question in your first paragraph after the code block, your interpretation of `awk '$0 ~ /ab/'` is correct: The code you provided is a conditional, and the default action that awk performs when given a conditional is `{ print $0 }`, printing the entire line when that conditional is true. (By contrast, if you give it only a block and no condition describing when to run it, `awk` runs that block unconditionally for every line of input). – Charles Duffy May 24 '17 at 20:12
  • 1
    ...so, if you ran `printf '%s\n' abcd efgh ijkl | awk '$0 ~ /ab/'`, only the first line of the three emitted by `printf` would in turn be processed by `awk`. – Charles Duffy May 24 '17 at 20:13
  • 2
    ...the tricky thing about `match` is that it isn't actually returning a string; instead, it's returning an integer, and setting separate variables (`RSTART`, `RLENGTH`) based on where it finds the regex it's searching for. See https://www.gnu.org/software/gawk/manual/html_node/String-Functions.html – Charles Duffy May 24 '17 at 20:16
  • 1
    BTW, if you want to print a line from awk, that would be `print`, not `echo`. Different languages, different names for things, &c. – Charles Duffy May 24 '17 at 20:20
  • 1
    As another aside, in `awk 'BEGIN{echo match ('abc1223abc',/2/)}'`, the inner quotes are actually being eaten by the shell that started awk, not becoming part of the script itself. One way to avoid that is to use double-quotes on the inside for string literals. `awk 'BEGIN{print match ("abc1223abc",/2/)}' – Charles Duffy May 24 '17 at 20:22
  • Although if you have GNU awk, you can use match and store the results in an array using a third arg `match("string",/regex/,array)` – 123 May 24 '17 at 20:26
  • Shoot. Removed the bash tag from the question, and now I can't single-handedly reopen it (not having a dupehammer in the awk tag). – Charles Duffy May 24 '17 at 20:27
  • @CharlesDuffy It should still be closed as it is basic information that can be found in the manual, and is not likely to be of any help to anyone else. – 123 May 24 '17 at 21:19
  • @CharlesDuffy Thanks for this valuable information. Guys, close as you wish, I know my level of knowledge was not the minimum accepted here but I really didn't know even this. Thanks you all. – rado May 24 '17 at 21:34

2 Answers2

3

Taking your questions one at a time:

  1. How do the ~ and match() operators interact with each other?

They don't. At least not directly in your code. ~ is the regexp comparison operator. In the context of $0 ~ ENVIRON["search"] it is being used to test if the regexp contained in the environment variable search exists as part of the current record ($0). If it does then the code in the subsequent {...} block is executed, if it doesn't then it isn't.

  1. How does the match() have any effect, if its output isn't printed or echoed?

It identifies the starting point (and stores it in the awk variable RSTART) and the length (RLENGTH) of the first substring within the first parameter ($0) that matches the regexp provides as the second parameter (id=[0-9]+). With GNU awk it can also populate a 3rd array argument with segments of the matching string identified by round brackets (aka "capture groups").

  1. What does it actually return or do?

It returns the value of RSTART which is zero if no match was found, 1 or greater otherwise. For what it does see the previous answer.

  1. How can I use it in my own code?

As shown in the example you posted would be one way but that code would more typically be written as:

($0 ~ ENVIRON["search"]) && match($0,/id=[0-9]+/) {
    print substr($0, RSTART+3, RLENGTH-3)
}

and using a string rather than regexp comparison for the first part would probably be even more appropriate:

index($0,ENVIRON["search"]) && match($0,/id=[0-9]+/) {
    print substr($0, RSTART+3, RLENGTH-3)
}

Get the book Effective Awk Programming, 4th Edition, by Arnold Robbins to learn how to use awk.

mklement0
  • 382,024
  • 64
  • 607
  • 775
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • 1
    That's was the exact answer I was looking for. Thanks for you patience and knowledge. – rado May 31 '17 at 19:02
-2
  • use the regex id=[0-9]+ to find a match in each line
  • if the start position of the match (RSTART) is not 0 then:
  • print the match without the id=

this is shorter but does the same:

xinput --list | grep -Po 'id=[0-9]+' | cut -c4-
Timo
  • 308
  • 2
  • 10
  • 1
    The are asking how the code works, not to rewrite it. – 123 May 24 '17 at 20:06
  • If you have to ask on Stackoverflow to understand the code, the code might be too complicated. So rewriting it to improve the readability is definitly an option. – Timo May 24 '17 at 20:09
  • 1
    No it isn't, you can't decide what the question should be. – 123 May 24 '17 at 20:09