Bash: Find file with max lines count

Question

This is my try to do it

Find all *.java files
find . -name '*.java'
Count lines
wc -l
Delete last line
sed '$d'
Use AWK to find max lines-count in wc output
awk 'max=="" || data=="" || $1 > max {max=$1 ; data=$2} END{ print max " " data}'

then merge it to single line

find . -name '*.java' | xargs wc -l | sed '$d' | awk 'max=="" || data=="" || $1 > max {max=$1 ; data=$2} END{ print max " " data}'

Can I somehow implement counting just non-blank lines?

Your solution as is will probably fall over when encountering unusual file names. Use `-print0` in `find` in conjunction with `-0` option in `xargs`, something like this - `find . -name '*.java' -print0 | xargs -0 wc -l | sort -n | tail -2 | head -1` — potong, Dec 14 '11 at 10:13

Shawn Chin · Accepted Answer · 2012-11-26T11:20:53.827

25

find . -type f -name "*.java" -exec grep -H -c '[^[:space:]]' {} \; | \
    sort -nr -t":" -k2 | awk -F: '{print $1; exit;}'

Replace the awk command with head -n1 if you also want to see the number of non-blank lines.

Breakdown of the command:

find . -type f -name "*.java" -exec grep -H -c '[^[:space:]]' {} \; 
'---------------------------'       '-----------------------'
             |                                   |
   for each *.java file             Use grep to count non-empty lines
                                   -H includes filenames in the output
                                 (output = ./full/path/to/file.java:count)

| sort -nr -t":" -k2  | awk -F: '{print $1; exit;}'
  '----------------'    '-------------------------'
          |                            |
  Sort the output in         Print filename of the first entry (largest count)
reverse order using the         then exit immediately
  second column (count)

edited Nov 26 '12 at 11:20

answered Dec 13 '11 at 12:28

Shawn Chin

84,080
19
162
191

Great, I like this more, cause it revealed `find -exec` option, which is more useful than looping – Marek Sebera Dec 13 '11 at 12:32
It'd fail for file names that contain colons or newlines. – Ed Morton Nov 26 '12 at 22:28
@EdMorton your filenames contain newlines? – Marek Sebera May 18 '23 at 19:32
MarekSebera I personally don't deliberately create file names containing newlines but I do come across them on various systems I work on and assuming your software will never have to work when the file names contains newlines is one of the ways your code can fail and be exploited. Using `-print0` and `xargs -0` as [@potong suggested](https://stackoverflow.com/questions/8488301/bash-find-file-with-max-lines-count/8489243?noredirect=1#comment10523674_8488301) and I used in [my answer](https://stackoverflow.com/a/13574146/1745001) is one way to help you avoid such problems. – Ed Morton May 18 '23 at 20:45
So the 3 answers to `your filenames contain newlines?` are - a) no, not the filenames I manually create, b) maybe, e.g. filenames I create as a result of being required by a customer to write a tool to create file names from input that itself can contain newlines, e.g. fields in a CSV exported from Excel and c) yes, the filenames I don't create but that can exist on machines my software runs on. – Ed Morton May 18 '23 at 20:56

score 17 · Answer 2 · edited Aug 03 '18 at 18:39

17

find . -name "*.java" -type f | xargs wc -l | sort -rn | grep -v ' total$' | head -1

edited Aug 03 '18 at 18:39

Dan Oak

704
1
7
26

answered Dec 13 '11 at 12:21

Vijay

65,327
90
227
319

Not bad, but needs edit to show only file with most lines of code, now it shows all files with their counts – Marek Sebera Dec 13 '11 at 12:32
yeah ..you are right.just forgot to add one more pipe.added now – Vijay Dec 14 '11 at 05:38
Super helpful to get a "top 10" of files with most lines in it, by changing `head -1` to `head -10` – anaotha Nov 01 '22 at 08:27

score 0 · Answer 3 · answered Nov 26 '12 at 22:20

To get the size of all of your files using awk is just:

$ find . -name '*.java' -print0 | xargs -0 awk '
BEGIN { for (i=1;i<ARGC;i++) size[ARGV[i]]=0 }
{ size[FILENAME]++ }
END { for (file in size) print size[file], file }
'

To get the count of the non-empty lines, simply make the line where you increment the size[] conditional:

$ find . -name '*.java' -print0 | xargs -0 awk '
BEGIN { for (i=1;i<ARGC;i++) size[ARGV[i]]=0 }
NF { size[FILENAME]++ }
END { for (file in size) print size[file], file }
'

(If you want to consider lines that contain only blanks as "empty" then replace NF with /^./.)

To get only the file with the most non-empty lines just tweak again:

$ find . -name '*.java' -print0 | xargs -0 awk '
BEGIN { for (i=1;i<ARGC;i++) size[ARGV[i]]=0 }
NF { size[FILENAME]++ }
END {
   for (file in size) {
      if (size[file] >= maxSize) {
         maxSize = size[file]
         maxFile = file
      }
   }
   print maxSize, maxFile
}
'

holygeek · Answer 4 · 2012-12-12T13:21:01.453

0

Something like this might work:

find . -name '*.java'|while read filename; do
    nlines=`grep -v -E '^[[:space:]]*$' "$filename"|wc -l`
    echo $nlines $filename
done|sort -nr|head -1

(edited as per Ed Morton's comment. I must have had too much coffee :-) )

edited Dec 12 '12 at 13:21

answered Dec 13 '11 at 11:30

holygeek

15,653
1
40
50

Bash: Find file with max lines count

4 Answers4