1

I have searched but found nothing so far. i am looking to list out a directory, then cat each file for unique items then use those two items to count items in a file.

1st ls dir

ls
  • file1.txt
  • file2.txt
  • file3.txt etc...

2nd grep each for unique

cat $file | awk '{print $8}' |  sort | uniq 

which should output numbers

  • 83886096
  • 1040187393
  • 201326673 etc...

and 3rd, use the uniq numbers found to grep the file it came from to count how many there are

cat $file | grep $output | wc -l

And somehow get a nice output with $file $output $count on lines

Thank you ahead of time

I am assuming i will have to do something of this nature but more complicated (since i cant get it to work)

FILE="$(ls -1)"
ls > list.txt
input=list.txt
while read line
do
OUTPUT=cat ${FILE} | awk '{print $8}' |  sort | uniq 
cat ${FILE} | grep ${OUTPUT} | wc -l
done < "$input"

When i run it, it seems to kind of work, I get the following output

grep: 0652-033 Cannot open 83886096.
       0
grep: 0652-033 Cannot open 83886096.
       0

So it found the files and read them but could not do the count

fedorqui
  • 275,237
  • 103
  • 548
  • 598
darbs
  • 13
  • 4
  • oook this is more or less clear, but could you show a [mcve] so we can work on a defined case? – fedorqui Aug 19 '16 at 10:19
  • what do you mean by defined case – darbs Aug 19 '16 at 10:25
  • some sample input and desired output – fedorqui Aug 19 '16 at 10:27
  • So given a set of files you want to count how many times the 8th field occurs in each file, right? Note your attempt has some syntax errors that can be traced via http://www.shellcheck.net/ . For example, to store the output in a variable you need to say `var=$(command)`, so you would say `OUTPUT=$(awk '...' file | sort | uniq)`. – fedorqui Aug 19 '16 at 10:40

4 Answers4

6

Do nont parse the output of ls. Instead, just loop through the files. This way you also avoid using intermediate files:

for file in *;
do
   # things with "$file"
done

Then, you are saying:

OUTPUT=cat ${FILE} | awk '{print $8}' |  sort | uniq 

To start, storing the output of a command in a variable requires the syntax var=$(command). Otherwise, when you say var=command1 command2... one of these things can happen. Then, cat file | awk '...' is equivalent to awk '...' file, so you can directly say OUTPUT=$(awk "$FILE" | sort | uniq). awk can do all of this alone, but we will address this later.

cat ${FILE} | grep ${OUTPUT} | wc -l

Same here with cat. Also, grep -c does this, so you can just say:

grep -c "$OUTPUT" "$FILE"

All together, it would be:

for file in *;
do
   OUTPUT=$(awk "$FILE" | sort | uniq)
   grep -c "$OUTPUT" "$FILE"
done

But in fact awk alone can do it:

awk '{count[$8]++} ENDFILE {print FILENAME; for (f in count) print f, count[f]; delete count}' *

This loops through all the files in the current directory and counts the number of times a given 8th field appears in each one. Then it prints a summary for every file.

Note this is GNU awk specific since it uses ENDFILE.

See some sample input/output:

$ tail f*
==> f1 <==
field1 field2 field3 field4 field5 field6 field7 xfield8 field9
field1 field2 field3 field4 field5 field6 field7 yfield8 field9
field1 field2 field3 field4 field5 field6 field7 yfield8 field9
field1 field2 field3 field4 field5 field6 field7 zfield8 field9

==> f2 <==
field1 field2 field3 field4 field5 field6 field7 xfield8 field9
field1 field2 field3 field4 field5 field6 field7 yfield8 field9
field1 field2 field3 field4 field5 field6 field7 zfield8 field9
field1 field2 field3 field4 field5 field6 field7 zfield8 field9

==> f3 <==
field1 field2 field3 field4 field5 field6 field7 xfield8 field9
field1 field2 field3 field4 field5 field6 field7 xfield8 field9
field1 field2 field3 field4 field5 field6 field7 xfield8 field9
field1 field2 field3 field4 field5 field6 field7 yfield8 field9
field1 field2 field3 field4 field5 field6 field7 yfield8 field9
field1 field2 field3 field4 field5 field6 field7 zfield8 field9
$ awk '{count[$8]++} ENDFILE {print FILENAME; for (f in count) print f, count[f]; delete count}' f*
f1
xfield8 1
yfield8 2
zfield8 1
f2
xfield8 1
yfield8 1
zfield8 2
f3
xfield8 3
yfield8 2
zfield8 1
Community
  • 1
  • 1
fedorqui
  • 275,237
  • 103
  • 548
  • 598
1

Using @fedorqui's (thanks for providing it) data:

$ for i in f[123]; do echo "$i:"; cut -d \  -f 8 "$i" |sort|uniq -c; done
f1:
      1 xfield8
      2 yfield8
      1 zfield8
f2:
      1 xfield8
      1 yfield8
      2 zfield8
f3:
      3 xfield8
      2 yfield8
      1 zfield8
fedorqui
  • 275,237
  • 103
  • 548
  • 598
James Brown
  • 36,089
  • 7
  • 43
  • 59
0

Maybe this answer is completely what you're not looking for, but I'll try anyway: I'd advise you to write a command to see the name of the files, followed by their content, you can put this into a logfile, which will look like this:

file1 content1
file1 content1
file1 content2
file2 content1
file2 content2
file2 content2
file2 content2
...

You then import this file into Excel, and using a subtotal or other data analysis feature you can get the job done.

Dominique
  • 16,450
  • 15
  • 56
  • 112
-1

I think this is what you are trying to do

ls | awk '{print "> "$1; system("cat "$1" | cut -f8 | sort | uniq");}' | awk '{if($1==">"){ Filename=$2; next;} printf Filename" "$1" ";system("cat "Filename" | grep "$1" | wc -l") }}'

I do not know what your file delimiter is, so assuming a whitespace cut should work.

FoldedChromatin
  • 217
  • 1
  • 4
  • 12
  • 1
    That is the wrong approach in just about every way and will fail given various file names and contents. – Ed Morton Aug 19 '16 at 12:24