0

This is essentially the command I want, All of it works except that I want to print something special in my third column that would use shell commands(or just more awk commands I guess but I don't know how I would fit this into the original awk statement). All I need help with is the pseudo command substitution between $2, and ar[$4,$1] in the print statement but left the rest in for the sake of specificity.

awk 'NR==FNR{ar[$3,$2]=$1+ar[$3,$2]; }
     NR>FNR && ar[$4,$1] {print "hs"$1,$2,`awk '$1 == #$1 from outer awk command# file2 | tail -n 1 | awk '{print $3}'`, ar[$4,$1]}' file1 file2

file1 will look like

5   8       t11 
15  7       t12 
3   7       t14

file2 will look like

8 4520 5560 t11 
8 5560 6610 t12 
8 6610 7400 t13 
7 9350 10610 t11 
7 10610 11770 t12 
7 11770 14627 t13
7 14627 16789 t14 

And output should look like

8 4520 7400 5
7 10610 16789 15
7 14647 16789 3

Thank-You!

Sam
  • 1,765
  • 11
  • 82
  • 176
  • 2
    You're thinking about this wrong. Awk is a tool to manipulate text. It is not an environment from which to call tools (including other awk instances) - that's what a shell is for. [edit] your question to describe what you want to do to your input to create your output (as opposed to **how** you think you need to do it) so we can help you. – Ed Morton Jun 09 '16 at 22:59
  • I don't see how `ar[$4,$1]` can return the values in last column of your sample output. For that matter, I don't see anything in the input can could turn out to generate the values of `5` and `15`. So, as EdM says, lets see some rules about how to process your data. You can almost certainly achieve your required results either just in `awk`, or be refactoring your shell code and how it calls `awk`. **update your Q** please. Good luck. – shellter Jun 10 '16 at 01:59
  • 1
    @shelter, the `5` and `15` are from _file1_. – agc Jun 10 '16 at 06:43
  • @shelter I'm sorry, I edited my question so one of the input files has t12 in it's third column instead of t11, the output should hopefully make more sense now, and the part with ar[$4,$1] does work, I used an adjusted part of it successfully but can't figure this one part out – Sam Jun 13 '16 at 17:19

1 Answers1

2

Non-awk, inefficient shell tools code:

while read a b c ; do \
    echo -n "$b " ; \
    egrep "^$b " file2 | \
      grep -A 9999999 " $c" | \
      cut -d' ' -f2,3 | \
      sed '1{s/ .*//;t}
           ${s/.* //;t};d' | \
      xargs echo -n  ; \
    echo " $a" ; \
done < file1 | \
  column -t

Output:

8  4520  7400   5
7  10610 16789  15

The main loop inputs file1 which controls what in file2 needs to be printed. file1 has 3 fields, so read needs 3 variables: $a, $b, and $c. The output uses $b and $a, so those two variables come "for free" -- the first and last lines of the main loop, (both echos), prefix $b and suffix $a to the two numbers in the middle of each line.

The egrep prints every line in file2 that begins with $b, but of those lines we only want the one that ends in $c plus the lines after that, which is what grep -A ... prints. Only the middle two columns are needed, so cut prints just those columns. Now we have a two column block of numbers, and we only want the upper left corner, or the lower right corner, which the sed code prints...

Any sed code automatically counts lines as it runs. When sed hits the first line, it runs what's in the first set of curly brackets, ('1{<code>}'). If that fails sed checks if it's the last line, ($ means last line), if it is, sed runs what's in the second set of curly brackets, ('${<code>}'). If it's not the first or last line sed deletes it.

Inside those curly brackets: s/ .*// works just like cut -f 1 would. The closing t means 'GOTO label', but when there's no 'label' sed just starts a new cycle, reading another line -- without t, the code would run the d, and print nothing. With two fields, s/.* // works like cut -f 2, etc.

Each pass of the main while loop sed prints two numbers, but each is on it's own line. Piping that to xargs echo -n puts both numbers on the same line as the $b was printed on.

agc
  • 7,973
  • 2
  • 29
  • 50
  • Thank-you, i realized i typed one of my input files wrong though, it should have been t12 in column 3 instead of t11 in the row starting with 7 so that the output has 10610 in it's second column second row(that part was there before) Do you know how to adjust this code accordingly – Sam Jun 13 '16 at 17:21
  • Also when I run this I get sed: 1: "1{s/ .*//;t};${s/.* //; ...": unexpected EOF (pending }'s), and I'm not too familiar with sed – Sam Jun 13 '16 at 17:35
  • Copy the whole block, then paste it _all at once_ to the command line, it should work. – agc Jun 13 '16 at 17:44
  • On row2,col3 of _file1_ being 't12', it makes no difference. The above code does not use column #3 of _file1_, or column #4 of _file2_. – agc Jun 13 '16 at 17:51
  • copied and pasted the whole thing and still got this sed: 1: "1{s/ .*//;t};${s/.* //; ...": unexpected EOF (pending }'s) sed: 1: "1{s/ .*//;t};${s/.* //; ...": unexpected EOF (pending }'s) – Sam Jun 13 '16 at 18:18
  • I'm running GNU `sed`. Perhaps you're running an OSX (Apple) or BSD system. Non GNU `sed` might produce an "[Unexpected EOF](http://stackoverflow.com/questions/15467616/sed-gives-me-unexpected-eof-pending-s-error-and-i-have-no-idea-why)" error. I've just tweaked the code to allow for that, so please try the new code. – agc Jun 13 '16 at 18:43
  • Ok, thank-you, the only problem still is with the new file1 that uses t12 instead of t11 in it's third column it still gives back "7 9350 16789 15" instead of "7 10610 16789 15" which is what I'm looking for, also I think i downloaded gnu said now, but I'm not sure @agc – Sam Jun 13 '16 at 19:17
  • D'oh! Either I'd not noticed that '10610', or mistook it for a typo. Finally fixed that... (assuming _file2_ has less than 10,000,000 matching lines anyway.) – agc Jun 13 '16 at 19:36
  • Would you mind explaining the sed command a little bit, I'm confused by what the t is doing and why you are putting a delete command at the end, also what is specified by line one and the last line, I feel like there is only one line that is passed to it – Sam Jun 14 '16 at 16:40