3 actions, cat, grep then wc with info from before

Question

I have searched but found nothing so far. i am looking to list out a directory, then cat each file for unique items then use those two items to count items in a file.

1st ls dir

ls

file1.txt
file2.txt
file3.txt etc...

2nd grep each for unique

cat $file | awk '{print $8}' |  sort | uniq

which should output numbers

83886096
1040187393
201326673 etc...

and 3rd, use the uniq numbers found to grep the file it came from to count how many there are

cat $file | grep $output | wc -l

And somehow get a nice output with $file $output $count on lines

Thank you ahead of time

I am assuming i will have to do something of this nature but more complicated (since i cant get it to work)

FILE="$(ls -1)"
ls > list.txt
input=list.txt
while read line
do
OUTPUT=cat ${FILE} | awk '{print $8}' |  sort | uniq 
cat ${FILE} | grep ${OUTPUT} | wc -l
done < "$input"

When i run it, it seems to kind of work, I get the following output

grep: 0652-033 Cannot open 83886096.
       0
grep: 0652-033 Cannot open 83886096.
       0

So it found the files and read them but could not do the count

oook this is more or less clear, but could you show a [mcve] so we can work on a defined case? — fedorqui, Aug 19 '16 at 10:19
So given a set of files you want to count how many times the 8th field occurs in each file, right? Note your attempt has some syntax errors that can be traced via http://www.shellcheck.net/ . For example, to store the output in a variable you need to say `var=$(command)`, so you would say `OUTPUT=$(awk '...' file | sort | uniq)`. — fedorqui, Aug 19 '16 at 10:40

score 6 · Accepted Answer · edited May 23 '17 at 12:33

Do nont parse the output of ls. Instead, just loop through the files. This way you also avoid using intermediate files:

for file in *;
do
   # things with "$file"
done

Then, you are saying:

OUTPUT=cat ${FILE} | awk '{print $8}' |  sort | uniq

To start, storing the output of a command in a variable requires the syntax var=$(command). Otherwise, when you say var=command1 command2... one of these things can happen. Then, cat file | awk '...' is equivalent to awk '...' file, so you can directly say OUTPUT=$(awk "$FILE" | sort | uniq). awk can do all of this alone, but we will address this later.

cat ${FILE} | grep ${OUTPUT} | wc -l

Same here with cat. Also, grep -c does this, so you can just say:

grep -c "$OUTPUT" "$FILE"

All together, it would be:

for file in *;
do
   OUTPUT=$(awk "$FILE" | sort | uniq)
   grep -c "$OUTPUT" "$FILE"
done

But in fact awk alone can do it:

awk '{count[$8]++} ENDFILE {print FILENAME; for (f in count) print f, count[f]; delete count}' *

This loops through all the files in the current directory and counts the number of times a given 8th field appears in each one. Then it prints a summary for every file.

Note this is GNU awk specific since it uses ENDFILE.

See some sample input/output:

$ tail f*
==> f1 <==
field1 field2 field3 field4 field5 field6 field7 xfield8 field9
field1 field2 field3 field4 field5 field6 field7 yfield8 field9
field1 field2 field3 field4 field5 field6 field7 yfield8 field9
field1 field2 field3 field4 field5 field6 field7 zfield8 field9

==> f2 <==
field1 field2 field3 field4 field5 field6 field7 xfield8 field9
field1 field2 field3 field4 field5 field6 field7 yfield8 field9
field1 field2 field3 field4 field5 field6 field7 zfield8 field9
field1 field2 field3 field4 field5 field6 field7 zfield8 field9

==> f3 <==
field1 field2 field3 field4 field5 field6 field7 xfield8 field9
field1 field2 field3 field4 field5 field6 field7 xfield8 field9
field1 field2 field3 field4 field5 field6 field7 xfield8 field9
field1 field2 field3 field4 field5 field6 field7 yfield8 field9
field1 field2 field3 field4 field5 field6 field7 yfield8 field9
field1 field2 field3 field4 field5 field6 field7 zfield8 field9
$ awk '{count[$8]++} ENDFILE {print FILENAME; for (f in count) print f, count[f]; delete count}' f*
f1
xfield8 1
yfield8 2
zfield8 1
f2
xfield8 1
yfield8 1
zfield8 2
f3
xfield8 3
yfield8 2
zfield8 1

Nice. You should just add it's gawk-specific due to ENDFILE. — Ed Morton, Aug 19 '16 at 12:24

score 1 · Answer 2 · edited Aug 19 '16 at 13:30

1

Using @fedorqui's (thanks for providing it) data:

$ for i in f[123]; do echo "$i:"; cut -d \  -f 8 "$i" |sort|uniq -c; done
f1:
      1 xfield8
      2 yfield8
      1 zfield8
f2:
      1 xfield8
      1 yfield8
      2 zfield8
f3:
      3 xfield8
      2 yfield8
      1 zfield8

edited Aug 19 '16 at 13:30

fedorqui

275,237
103
548
598

answered Aug 19 '16 at 13:01

James Brown

36,089
7
43
59

score 0 · Answer 3 · answered Aug 19 '16 at 14:14

Maybe this answer is completely what you're not looking for, but I'll try anyway: I'd advise you to write a command to see the name of the files, followed by their content, you can put this into a logfile, which will look like this:

file1 content1
file1 content1
file1 content2
file2 content1
file2 content2
file2 content2
file2 content2
...

You then import this file into Excel, and using a subtotal or other data analysis feature you can get the job done.

score -1 · Answer 4 · answered Aug 19 '16 at 10:40

-1

I think this is what you are trying to do

ls | awk '{print "> "$1; system("cat "$1" | cut -f8 | sort | uniq");}' | awk '{if($1==">"){ Filename=$2; next;} printf Filename" "$1" ";system("cat "Filename" | grep "$1" | wc -l") }}'

I do not know what your file delimiter is, so assuming a whitespace cut should work.

answered Aug 19 '16 at 10:40

FoldedChromatin

217
1
4
12

1

That is the wrong approach in just about every way and will fail given various file names and contents. – Ed Morton Aug 19 '16 at 12:24

3 actions, cat, grep then wc with info from before

4 Answers4