How can I use sed (or awk or maybe a perl one-liner) to get values from specific columns in file A and use it to find lines in file B?

Question

OK, sedAwkPerl-fu-gurus. Here's one similar to these (Extract specific strings...) and (Using awk to...), except that I need to use the number extracted from columns 4-10 in each line of File A (a PO number from a sales order line item) and use it to locate all related lines from File B and print them to a new file.

File A (purchase order details) lines look like this:

xxx01234560000000000000000000 yyy zzzz000000

File B (vendor codes associated with POs) lines look like this:

00xxxxx01234567890123456789001234567890

Columns 4-10 in File A have a 7-digit PO number, which is found in columns 7-13 of file B. What I need to do is parse File A to get a PO number, and then create a new sub-file from File B containing only those lines in File B which have the POs found in File A. The sub-file created is essentially the sub-set of vendors from File B who have orders found in File A.

I have tried a couple of things, but I'm really spinning my wheels on trying to make a one-liner for this. I could work it out in a script by defining variables, etc., but I'm curious whether someone knows a slick one-liner to do a task like this. The two referenced methods put together ought to do it, but I'm not quite getting it.

Zsolt Botykai · Answer 1 · 2014-07-03T19:14:42.503

sed 's_^...\(\d\{7\}\).*_/^.\{6\}\1/p_' FIRSTFILE > FILTERLIST
sed -n -f FILTERLIST SECONDFILE > FILTEREDFILE

The first line generates a sed script from firstfile than the second line uses that script to filter the second line. This can be combined to one line too...

If the files are not that big you can do something like

awk 'BEGIN { # read the whole FIRSTFILE PO numbers to an array }
     substr($0,7,7} in array { print $0 }' SECONDFILE > FILTERED

You can do it like (but it will find the PO numbers anywhere on a line)

fgrep -f <(cut -b 4-10 FIRSTFILE) SECONDFILE

score 1 · Accepted Answer · answered Jul 03 '14 at 19:13

1

Here's a one-liner:

egrep -f <(cut -c4-10 A | sed -e 's/^/^.{6}/') B

It looks like the POs in file B actually start at column 8, not 7, but I made my regex start at column 7 as you asked in the question.

And in case there's the possibility of duplicates in A, you could increase efficiency by weeding those out before scanning file B:

egrep -f <(cut -c4-10 A | sort -u | sed -e 's/^/^.{6}/') B

answered Jul 03 '14 at 19:13

dg99

5,456
3
37
49

Rock and roll! The second one works like a charm - you were right, my column count was off but that's easy to adjust. Thanks for the fast answer. Cut! Cut! I need to remember cut... – noogrub Jul 03 '14 at 19:35

score 1 · Answer 3 · answered Jul 03 '14 at 20:06

1

Another way using only grep:

grep -f <(grep -Po '^.{3}\K.{7}' fileA) fileB

Explanation:

-P for perl regex
-o to select only the match
\K is Perl positive lookbehind

answered Jul 03 '14 at 20:06

Tiago Lopo

7,619
1
30
51

lol - wow. I had no idea these could be so short. Thank you all. Great fun! – noogrub Jul 03 '14 at 20:29

How can I use sed (or awk or maybe a perl one-liner) to get values from specific columns in file A and use it to find lines in file B?

3 Answers3