I have a directory that has daily files that contains over 500 fields per line and 600,000 rows.
I want to look at 1 file and find all lines that contain B2 on field #351.
Then search all files for any lines that match the values in fields 282, 341, 314, and 348 in the output from the first file.
Right now I have the following but it produces blank output:
ARCHIVEDIR=/appl/dir/archive
file1_tmp=$$.tmp
zcat ${ARCHIVEDIR}/FILE_12162019.gz | awk 'BEGIN{FS=OFS="|"} $351 == "B2"{gsub(/ /,""); print $282,$341,$314,$348}' > "$file1_tmp"
for fname in ${ARCHIVEDIR}/FILE_*; do
zcat "$fname" | awk -v fname="$fname" '
BEGIN { FS=OFS=SUBSEP="|" }
NR==FNR { tgts[$0]; next }
($282,$341,$314,$348) in tgts { print fname, $0 }
' "$file1_tmp" -
done
For example, file1 has 130,000 records containing B2 in field 351. I want to find any records from all files (including the original in file1) matching fields 282, 341, 314, and 348.
ORIGINAL POST Below - reposted to try and clear up some confusion
I gave up trying and ended up with the following in a for loop:
echo -e "$FILENAME|\c"
zcat $FILENAME | grep "$SYSTEM" | grep "$RECORDNUM" | grep "$LOCATION" | grep "$PENGUINS"
The output is:
FILENAME|{each Line the matches all 4 search variables}
I'm looking for an awk command that will clean it that output efficiently.
I have tried:
zcat $FILENAME | awk -v FILENAME=$FILENAME -v SYSTEM=$SYSTEM -v RECORDNUM=$RECORDNUM -v LOCATION=$LOCATION -v PENGUINS=$PENGUINS -v FS="|" -v OFS='|' '/SYSTEM/ && /RECORDNUM/ && /LOCATION/ && /PENGUINS/ {print FILENAME,$0}'`
and because the positional values will always be the same, I even tried the following:
zcat $FILENAME | awk -v FILENAME=$FILENAME -v SYSTEM=$SYSTEM -v RECORDNUM=$RECORDNUM -v LOCATION=$LOCATION -v PENGUINS=$PENGUINS -v FS="|" -v OFS='|' '($282 == SYSTEM) && ($341 == RECORDNUM) && ($314 == LOCATION) && ($348 == PENGUINS) {print FILENAME,$0}'
SAMPLE INPUT FILE: (for testing purposes, I created 4 copies of the following and gzipped the files) sh-4.2$ zcat file1 SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED SYSTEM2|SPACER|88083|SPACER|SPACER|FLORIDA|SPACER|SPACER|SPACER|MOUNTED SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED SYSTEM2|SPACER|123141|SPACER|SPACER|NOCAL|SPACER|SPACER|SPACER|MOUNTED SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED SYSTEM2|SPACER|90391|SPACER|SPACER|TEXAS|SPACER|SPACER|SPACER|MOUNTED SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED SYSTEM2|SPACER|354295|SPACER|SPACER|FLORIDA|SPACER|SPACER|SPACER|MOUNTED
sh-4.2$ ls -ls
total 32
4 -rw-rw-rw- 1 host pdx 170 Dec 20 06:10 file1.gz
4 -rw-rw-rw- 1 host pdx 170 Dec 20 06:10 file2.gz
4 -rw-rw-rw- 1 host pdx 170 Dec 20 06:10 file3.gz
4 -rw-rw-rw- 1 host pdx 170 Dec 20 06:10 file4.gz
4 -rwxrwxrwx 1 host pdx 727 Dec 20 06:15 testawk
4 -rwxrwxrwx 1 host pdx 626 Dec 20 06:16 testgrep
Then created 2 scripts: testawk
for FILENAME in `ls file1.gz`
do
zcat $FILENAME | awk -v FS='|' -v OFS='|' '{if ($10 == "STUFFED") print $1,$3,$6,$10}' | tr -d " " >> $$.tmp
done
for TMPR in `cat $$.tmp`
do
SYSTEM=`echo $TMPR | awk -v FS='|' '{print $1}'`; export SYSTEM
RECORDNUM=`echo $TMPR | awk -v FS='|' '{print $2}'`; export RECORDNUM
LOCATION=`echo $TMPR | awk -v FS='|' '{print $3}'`; export LOCATION
PENGUINS=`echo $TMPR | awk -v FS='|' '{print $4}'`; export PENGUINS
for FILENAME in `ls fil*`
do
export FILENAME
zcat $FILENAME | awk -v FILENAME=$FILENAME -v SYSTEM=$SYSTEM -v RECORDNUM=$RECORDNUM -v LOCATION=$LOCATION -v PENGUINS=$PENGUINS -v FS="|" '/SYSTEM/ && /RECORDNUM/ && /LOCATION/ && /PENGUINS/'
done
done
and
testgrep
for FILENAME in `ls file1.gz`
do
zcat $FILENAME | awk -v FS='|' -v OFS='|' '{if ($10 == "STUFFED") print $1,$3,$6,$10}' | tr -d " " >> $$.tmp
done
for TMPR in `cat $$.tmp`
do
SYSTEM=`echo $TMPR | awk -v FS='|' '{print $1}'`; export SYSTEM
RECORDNUM=`echo $TMPR | awk -v FS='|' '{print $2}'`; export RECORDNUM
LOCATION=`echo $TMPR | awk -v FS='|' '{print $3}'`; export LOCATION
PENGUINS=`echo $TMPR | awk -v FS='|' '{print $4}'`; export PENGUINS
for FILENAME in `ls fil*`
do
echo -e "$FILENAME|\c"; zcat $FILENAME | grep "$SYSTEM" | grep "$RECORDNUM" | grep "$LOCATION" | grep "$PENGUINS"
done
done
When I execute testawk, the output is blank.
When I execute testgrep, the output contains all the lines where $PENGUIN=STUFFED with the filename in the beginning of each line.
sh-4.2$ ./testgrep
file1.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
BREAKDOWN of what I'm doing and trying to do: The following portion of the scripts are the same, it will create a file called $$.tmp for any lines in file1.gz that have "STUFFED" in field 10. This file will only contain the values from fields 1, 3, 6, and 10. (this is used in the next portion of the scripts and it currently works)
for FILENAME in `ls file1.gz`
do
zcat $FILENAME | awk -v FS='|' -v OFS='|' '{if ($10 == "STUFFED") print $1,$3,$6,$10}' | tr -d " " >> $$.tmp
done
The next portion of the script assigns variables for each of the 4 fields and exports the variables to be used in awk (not sure if the export is needed).
for TMPR in `cat $$.tmp`
do
SYSTEM=`echo $TMPR | awk -v FS='|' '{print $1}'`; export SYSTEM
RECORDNUM=`echo $TMPR | awk -v FS='|' '{print $2}'`; export RECORDNUM
LOCATION=`echo $TMPR | awk -v FS='|' '{print $3}'`; export LOCATION
PENGUINS=`echo $TMPR | awk -v FS='|' '{print $4}'`; export PENGUINS
This portion of the script will start my for loop to check all files starting with fil for matches: (I've included both the awk and grep commands by commented them out)
for FILENAME in `ls fil*`
do
export FILENAME
# zcat $FILENAME | awk -v FILENAME=$FILENAME -v SYSTEM=$SYSTEM -v RECORDNUM=$RECORDNUM -v LOCATION=$LOCATION -v PENGUINS=$PENGUINS -v FS="|" '/SYSTEM/ && /RECORDNUM/ && /LOCATION/ && /PENGUINS/'
# echo -e "$FILENAME|\c"; zcat $FILENAME | grep "$SYSTEM" | grep "$RECORDNUM" | grep "$LOCATION" | grep "$PENGUINS"
done
And then I end the original for loop: done